2024-08-13 14:40:41,654 INFO [train_multi_KD3.py:1187] (2/4) Training started 2024-08-13 14:40:41,655 INFO [train_multi_KD3.py:1197] (2/4) Device: cuda:2 2024-08-13 14:40:41,657 INFO [train_multi_KD3.py:1212] (2/4) Using dtype=torch.bfloat16 2024-08-13 14:40:41,657 INFO [train_multi_KD3.py:1214] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 16, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-13 14:40:41,657 INFO [train_multi_KD3.py:1216] (2/4) About to create model 2024-08-13 14:40:42,081 INFO [model_shift.py:142] (2/4) Delta_t: 6 when computing the distillation loss 2024-08-13 14:40:42,085 INFO [train_multi_KD3.py:1220] (2/4) Number of model parameters: 66484678 2024-08-13 14:40:42,086 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-15.pt 2024-08-13 14:40:44,546 INFO [train_multi_KD3.py:1235] (2/4) Using DDP 2024-08-13 14:40:46,716 INFO [train_multi_KD3.py:1247] (2/4) Loading optimizer state dict 2024-08-13 14:40:47,162 INFO [train_multi_KD3.py:1255] (2/4) Loading scheduler state dict 2024-08-13 14:40:47,162 INFO [kd_datamodule.py:690] (2/4) About to get train 960 cuts 2024-08-13 14:40:47,213 INFO [train_multi_KD3.py:1306] (2/4) Getting audioset cuts 2024-08-13 14:40:47,213 INFO [kd_datamodule.py:900] (2/4) About to get the audioset cuts for KD. 2024-08-13 14:40:47,215 INFO [kd_datamodule.py:869] (2/4) About to get the voxceleb cuts. 2024-08-13 14:40:47,216 INFO [kd_datamodule.py:880] (2/4) Adding voxceleb2 cuts. 2024-08-13 14:40:47,217 INFO [train_multi_KD3.py:1320] (2/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-13 14:40:56,789 INFO [train_multi_KD3.py:1322] (2/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-13 14:40:56,790 INFO [train_multi_KD3.py:1323] (2/4) Using weights: [1406195, 1904746, 1187704] 2024-08-13 14:40:56,790 INFO [train_multi_KD3.py:1332] (2/4) CutSet(len=4498645) [underlying data type: ] 2024-08-13 14:40:56,790 INFO [kd_datamodule.py:449] (2/4) Disable MUSAN 2024-08-13 14:40:56,791 INFO [kd_datamodule.py:489] (2/4) Disable SpecAugment 2024-08-13 14:40:56,791 INFO [kd_datamodule.py:491] (2/4) About to create train dataset 2024-08-13 14:40:56,792 INFO [kd_datamodule.py:528] (2/4) Using SimpleCutSampler 2024-08-13 14:40:56,793 INFO [kd_datamodule.py:536] (2/4) About to create train dataloader 2024-08-13 14:40:56,795 INFO [kd_datamodule.py:763] (2/4) About to get dev-clean cuts 2024-08-13 14:40:56,797 INFO [kd_datamodule.py:781] (2/4) About to get dev-other cuts 2024-08-13 14:40:56,798 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-13 14:40:57,115 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-13 14:40:57,115 INFO [kd_datamodule.py:840] (2/4) About to get the test set of voxceleb1 set. 2024-08-13 14:40:57,117 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-13 14:40:57,373 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-13 14:40:57,374 INFO [kd_datamodule.py:912] (2/4) About to get the audioset eval cuts. 2024-08-13 14:40:57,376 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-13 14:40:58,000 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-13 14:40:58,001 INFO [train_multi_KD3.py:1412] (2/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-13 14:40:58,001 INFO [train_multi_KD3.py:1416] (2/4) Loading grad scaler state dict 2024-08-13 14:41:09,295 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 14:41:13,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 0, loss[loss=0.112, beats_loss=0.009057, ecapa_loss=0.0001482, whisper_loss=0.1015, over 22371.00 frames. ], tot_loss[loss=0.112, beats_loss=0.009057, ecapa_loss=0.0001482, whisper_loss=0.1015, over 22371.00 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:41:13,687 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 14:41:29,036 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6535, 2.9082, 2.0858, 3.1456], device='cuda:2') 2024-08-13 14:41:44,597 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005685, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 14:41:58,062 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on SV_voxceleb1: loss=0.004519, beats_loss=0, ecapa_loss=0.0004519, whisper_loss=0, over 939242.00 frames. 2024-08-13 14:42:08,291 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.6857, 1.7412, 1.7992, 1.8046, 2.3541, 1.7417, 2.0171, 1.6669], device='cuda:2') 2024-08-13 14:43:00,997 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8314, 1.0667, 1.2373, 0.4890, 0.8594, 1.2005, 0.8398, 0.7504], device='cuda:2') 2024-08-13 14:43:30,549 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on AT_audioset: loss=0.02374, beats_loss=0.02374, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 14:43:30,550 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 14:43:31,033 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.690e-03 2024-08-13 14:44:14,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2173910.0, ans=0.1 2024-08-13 14:44:21,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2173910.0, ans=0.125 2024-08-13 14:44:32,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2174010.0, ans=0.125 2024-08-13 14:44:53,326 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 14:45:00,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2174010.0, ans=0.0 2024-08-13 14:45:28,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2174210.0, ans=0.125 2024-08-13 14:45:32,717 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 14:45:59,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 50, loss[loss=0.112, beats_loss=0.006107, ecapa_loss=0.0002244, whisper_loss=0.1037, over 15332.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01019, ecapa_loss=0.0001671, whisper_loss=0.09067, over 883340.90 frames. ], batch size: 61, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:46:29,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2174310.0, ans=0.04949747468305833 2024-08-13 14:46:38,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.896e+01 3.246e+01 4.521e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 14:46:50,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=12.0 2024-08-13 14:47:22,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-13 14:49:11,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2174810.0, ans=0.0 2024-08-13 14:49:14,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 100, loss[loss=0.1161, beats_loss=0.009002, ecapa_loss=0.0001612, whisper_loss=0.1055, over 17106.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01011, ecapa_loss=0.0001678, whisper_loss=0.08916, over 1547475.63 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:50:14,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2174910.0, ans=0.1 2024-08-13 14:51:25,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-08-13 14:52:27,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2175210.0, ans=0.2 2024-08-13 14:52:32,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 150, loss[loss=0.0849, beats_loss=0.01279, ecapa_loss=0.0001544, whisper_loss=0.07057, over 19803.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009984, ecapa_loss=0.0001671, whisper_loss=0.08977, over 2055120.79 frames. ], batch size: 81, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:53:01,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-13 14:53:04,338 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 14:53:06,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.627e+01 2.921e+01 3.180e+01 8.449e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-13 14:53:41,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2024-08-13 14:54:00,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-08-13 14:54:33,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 14:54:36,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2175710.0, ans=0.04949747468305833 2024-08-13 14:54:46,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2175710.0, ans=0.125 2024-08-13 14:54:50,373 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 14:54:57,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2175710.0, ans=0.125 2024-08-13 14:55:05,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 200, loss[loss=0.1071, beats_loss=0.01086, ecapa_loss=0.0001431, whisper_loss=0.09481, over 21593.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01009, ecapa_loss=0.0001664, whisper_loss=0.08955, over 2437816.93 frames. ], batch size: 84, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:55:06,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2175810.0, ans=0.125 2024-08-13 14:55:25,771 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 14:55:26,732 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 14:55:29,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-13 14:55:29,994 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 14:55:42,930 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 14:55:46,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2176010.0, ans=0.2 2024-08-13 14:55:48,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2024-08-13 14:56:00,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2176110.0, ans=0.125 2024-08-13 14:56:00,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2024-08-13 14:56:01,841 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 14:56:06,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=12.0 2024-08-13 14:56:18,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 14:56:28,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-13 14:56:28,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-13 14:56:29,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2176210.0, ans=0.0 2024-08-13 14:56:29,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2176210.0, ans=0.125 2024-08-13 14:56:32,516 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 250, loss[loss=0.1192, beats_loss=0.01119, ecapa_loss=0.000137, whisper_loss=0.1067, over 22742.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01021, ecapa_loss=0.0001646, whisper_loss=0.09066, over 2709979.44 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:56:33,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=8.0 2024-08-13 14:56:40,163 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 14:56:44,012 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 14:56:47,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.294e+01 2.573e+01 2.919e+01 5.746e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-13 14:57:11,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2176510.0, ans=0.0 2024-08-13 14:57:25,177 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 14:57:27,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2176610.0, ans=0.0 2024-08-13 14:57:32,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-08-13 14:57:34,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=12.0 2024-08-13 14:57:39,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2176710.0, ans=10.0 2024-08-13 14:57:47,346 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 14:57:54,992 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 14:57:56,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 300, loss[loss=0.09435, beats_loss=0.01167, ecapa_loss=0.0001646, whisper_loss=0.08104, over 17183.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001649, whisper_loss=0.09103, over 2973283.94 frames. ], batch size: 70, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:58:08,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2024-08-13 14:58:13,672 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 14:59:15,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 350, loss[loss=0.1004, beats_loss=0.01187, ecapa_loss=0.0001416, whisper_loss=0.0871, over 13924.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01041, ecapa_loss=0.0001631, whisper_loss=0.09094, over 3152900.64 frames. ], batch size: 57, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:59:16,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2177310.0, ans=0.2 2024-08-13 14:59:31,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.361e+01 2.664e+01 2.951e+01 4.705e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 15:00:13,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2177610.0, ans=0.125 2024-08-13 15:00:27,553 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 15:00:28,870 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 15:00:32,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 400, loss[loss=0.1153, beats_loss=0.01146, ecapa_loss=0.0001541, whisper_loss=0.1023, over 22337.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01041, ecapa_loss=0.0001626, whisper_loss=0.09169, over 3305471.42 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:00:48,754 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-13 15:01:00,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2177910.0, ans=0.0 2024-08-13 15:01:11,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-13 15:01:21,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-08-13 15:01:25,074 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 15:01:37,560 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 15:01:47,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 450, loss[loss=0.1078, beats_loss=0.01001, ecapa_loss=0.0001572, whisper_loss=0.09618, over 14001.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01037, ecapa_loss=0.0001629, whisper_loss=0.09171, over 3392083.11 frames. ], batch size: 54, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:01:55,269 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 15:01:55,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2178310.0, ans=0.125 2024-08-13 15:01:58,055 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 15:02:02,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.405e+01 2.560e+01 2.967e+01 1.017e+02, threshold=5.120e+01, percent-clipped=1.0 2024-08-13 15:02:02,315 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 15:02:07,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2024-08-13 15:02:21,135 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-13 15:02:36,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2178610.0, ans=0.125 2024-08-13 15:03:00,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 500, loss[loss=0.1134, beats_loss=0.009275, ecapa_loss=0.0001671, whisper_loss=0.1025, over 22423.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.0001623, whisper_loss=0.09161, over 3489309.63 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:03:01,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2178810.0, ans=0.125 2024-08-13 15:03:10,083 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 15:03:41,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2179010.0, ans=0.05 2024-08-13 15:03:43,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.03 vs. limit=5.0 2024-08-13 15:03:43,891 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 15:03:51,476 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-13 15:04:00,921 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 15:04:14,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 550, loss[loss=0.1024, beats_loss=0.01201, ecapa_loss=0.0001497, whisper_loss=0.08886, over 23155.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01047, ecapa_loss=0.0001607, whisper_loss=0.09173, over 3582949.09 frames. ], batch size: 92, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:04:29,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.286e+01 2.542e+01 2.908e+01 4.014e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-13 15:04:34,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2179410.0, ans=0.0 2024-08-13 15:04:35,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2179410.0, ans=0.125 2024-08-13 15:04:45,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2024-08-13 15:04:56,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2179510.0, ans=0.125 2024-08-13 15:04:59,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-08-13 15:05:05,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2179610.0, ans=0.1 2024-08-13 15:05:15,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2179710.0, ans=0.0 2024-08-13 15:05:27,368 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 15:05:29,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 600, loss[loss=0.1318, beats_loss=0.008771, ecapa_loss=0.0001469, whisper_loss=0.1215, over 17982.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01038, ecapa_loss=0.0001616, whisper_loss=0.09209, over 3635017.84 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:05:43,845 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-13 15:05:45,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2179910.0, ans=0.125 2024-08-13 15:05:58,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2180010.0, ans=0.2 2024-08-13 15:06:07,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2180010.0, ans=0.125 2024-08-13 15:06:14,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2180110.0, ans=0.07 2024-08-13 15:06:35,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2180210.0, ans=0.0 2024-08-13 15:06:40,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 650, loss[loss=0.0948, beats_loss=0.01171, ecapa_loss=0.0001524, whisper_loss=0.08156, over 19868.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001619, whisper_loss=0.09157, over 3679159.47 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:06:52,986 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 15:06:55,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.462e+01 2.734e+01 3.167e+01 1.676e+02, threshold=5.468e+01, percent-clipped=3.0 2024-08-13 15:07:32,709 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 15:07:36,660 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 15:07:45,639 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 15:07:46,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2180710.0, ans=0.125 2024-08-13 15:07:47,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2180710.0, ans=0.125 2024-08-13 15:07:51,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2180710.0, ans=0.0 2024-08-13 15:07:54,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 700, loss[loss=0.09638, beats_loss=0.01124, ecapa_loss=0.0001414, whisper_loss=0.08372, over 15626.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0105, ecapa_loss=0.000162, whisper_loss=0.09171, over 3709743.98 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:07:54,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-08-13 15:08:01,207 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 15:08:04,585 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 15:08:16,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2180910.0, ans=0.125 2024-08-13 15:08:18,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2180910.0, ans=0.125 2024-08-13 15:08:24,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2181010.0, ans=0.125 2024-08-13 15:09:06,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 750, loss[loss=0.09363, beats_loss=0.01163, ecapa_loss=0.0001471, whisper_loss=0.08053, over 16079.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001615, whisper_loss=0.09138, over 3724889.35 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:09:09,994 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 15:09:21,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.329e+01 2.542e+01 2.977e+01 1.085e+02, threshold=5.083e+01, percent-clipped=1.0 2024-08-13 15:09:26,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2181410.0, ans=0.0 2024-08-13 15:09:28,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2181410.0, ans=0.2 2024-08-13 15:09:39,341 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 15:09:41,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2181510.0, ans=0.2 2024-08-13 15:09:46,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2181510.0, ans=0.0 2024-08-13 15:10:18,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 800, loss[loss=0.07902, beats_loss=0.01275, ecapa_loss=0.0001264, whisper_loss=0.06501, over 14812.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001611, whisper_loss=0.09055, over 3763608.32 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:10:26,391 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 15:10:37,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.04 vs. limit=10.0 2024-08-13 15:10:48,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-13 15:10:49,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2182010.0, ans=0.0 2024-08-13 15:11:15,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=12.0 2024-08-13 15:11:19,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2182210.0, ans=0.125 2024-08-13 15:11:27,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2182210.0, ans=0.1 2024-08-13 15:11:32,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 850, loss[loss=0.09017, beats_loss=0.007981, ecapa_loss=0.0001684, whisper_loss=0.0805, over 14628.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001612, whisper_loss=0.09051, over 3794458.17 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:11:46,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.398e+01 2.663e+01 2.990e+01 7.176e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-13 15:11:51,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2182410.0, ans=0.125 2024-08-13 15:11:56,850 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 15:12:07,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2182510.0, ans=0.0 2024-08-13 15:12:07,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2182510.0, ans=0.05 2024-08-13 15:12:26,055 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:12:33,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-13 15:12:44,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 900, loss[loss=0.1239, beats_loss=0.006507, ecapa_loss=0.0002166, whisper_loss=0.1152, over 16021.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001608, whisper_loss=0.09069, over 3777833.13 frames. ], batch size: 64, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:12:53,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-13 15:13:03,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2182910.0, ans=0.0 2024-08-13 15:13:05,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2182910.0, ans=0.0 2024-08-13 15:13:12,531 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 15:13:23,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2183010.0, ans=0.2 2024-08-13 15:13:28,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2183110.0, ans=0.125 2024-08-13 15:13:44,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2183210.0, ans=0.125 2024-08-13 15:13:48,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2183210.0, ans=0.0 2024-08-13 15:13:52,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2183210.0, ans=0.2 2024-08-13 15:13:59,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 950, loss[loss=0.08952, beats_loss=0.01038, ecapa_loss=0.0001193, whisper_loss=0.07795, over 14729.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.00016, whisper_loss=0.09054, over 3795653.46 frames. ], batch size: 54, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:14:08,185 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 15:14:13,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=12.0 2024-08-13 15:14:13,557 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.387e+01 2.716e+01 2.954e+01 4.081e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 15:14:22,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2183410.0, ans=0.0 2024-08-13 15:14:24,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-13 15:14:33,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2183510.0, ans=0.0 2024-08-13 15:14:50,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2183610.0, ans=0.125 2024-08-13 15:14:55,223 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 15:14:55,497 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:15:14,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1000, loss[loss=0.1033, beats_loss=0.01169, ecapa_loss=0.0001604, whisper_loss=0.09, over 18771.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001593, whisper_loss=0.09045, over 3819990.33 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:15:20,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2183810.0, ans=10.0 2024-08-13 15:15:30,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2183910.0, ans=0.2 2024-08-13 15:15:36,237 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 15:15:49,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2184010.0, ans=0.0 2024-08-13 15:15:56,763 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 15:16:01,232 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 29 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 15:16:03,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2024-08-13 15:16:29,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1050, loss[loss=0.08447, beats_loss=0.01084, ecapa_loss=0.000224, whisper_loss=0.0714, over 15803.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001593, whisper_loss=0.09054, over 3824613.13 frames. ], batch size: 70, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:16:39,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2184310.0, ans=0.125 2024-08-13 15:16:42,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=2184310.0, ans=15.0 2024-08-13 15:16:42,989 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 15:16:43,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.435e+01 2.686e+01 3.027e+01 6.105e+01, threshold=5.372e+01, percent-clipped=2.0 2024-08-13 15:16:45,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-08-13 15:16:49,128 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:17:12,906 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 15:17:23,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2184610.0, ans=0.125 2024-08-13 15:17:31,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2184710.0, ans=0.035 2024-08-13 15:17:32,354 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 15:17:43,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1100, loss[loss=0.08114, beats_loss=0.01248, ecapa_loss=0.0001394, whisper_loss=0.06727, over 17038.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001603, whisper_loss=0.09056, over 3859008.94 frames. ], batch size: 66, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:17:43,969 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 15:17:48,606 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 15:17:53,434 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:18:20,010 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 15:18:28,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2185010.0, ans=0.125 2024-08-13 15:18:49,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2185210.0, ans=0.2 2024-08-13 15:19:00,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1150, loss[loss=0.1013, beats_loss=0.01231, ecapa_loss=0.0001291, whisper_loss=0.0877, over 21784.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001592, whisper_loss=0.09057, over 3862117.94 frames. ], batch size: 87, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:19:08,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=22.5 2024-08-13 15:19:16,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.495e+01 2.743e+01 3.086e+01 4.866e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 15:19:17,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-08-13 15:19:34,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2185510.0, ans=0.015 2024-08-13 15:20:30,006 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 15:20:45,415 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1200, loss[loss=0.1108, beats_loss=0.01099, ecapa_loss=0.0001474, whisper_loss=0.09829, over 16420.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001591, whisper_loss=0.09023, over 3842761.19 frames. ], batch size: 63, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:21:10,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-13 15:21:13,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2185910.0, ans=15.0 2024-08-13 15:21:18,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2186010.0, ans=0.2 2024-08-13 15:21:21,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2186010.0, ans=0.125 2024-08-13 15:21:29,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2186010.0, ans=0.125 2024-08-13 15:21:42,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2186110.0, ans=0.125 2024-08-13 15:21:42,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2186110.0, ans=0.125 2024-08-13 15:21:43,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2186110.0, ans=0.0 2024-08-13 15:22:01,300 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 15:22:07,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1250, loss[loss=0.1003, beats_loss=0.009822, ecapa_loss=0.000181, whisper_loss=0.08866, over 22065.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01086, ecapa_loss=0.0001588, whisper_loss=0.08949, over 3851638.23 frames. ], batch size: 93, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:22:08,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-13 15:22:21,634 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 15:22:22,791 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.204e+01 2.472e+01 2.749e+01 3.995e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-13 15:22:27,452 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 15:22:43,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2186510.0, ans=0.2 2024-08-13 15:22:47,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2186510.0, ans=0.125 2024-08-13 15:22:53,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2186610.0, ans=0.125 2024-08-13 15:22:59,160 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-13 15:23:01,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2024-08-13 15:23:05,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2186610.0, ans=0.1 2024-08-13 15:23:25,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1300, loss[loss=0.1046, beats_loss=0.01061, ecapa_loss=0.0001456, whisper_loss=0.09257, over 17691.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01088, ecapa_loss=0.0001586, whisper_loss=0.08887, over 3845738.63 frames. ], batch size: 68, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:23:36,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2186810.0, ans=0.0 2024-08-13 15:23:45,247 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=9.100e-01 2024-08-13 15:23:48,854 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 15:24:41,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1350, loss[loss=0.1065, beats_loss=0.01314, ecapa_loss=0.0001172, whisper_loss=0.09216, over 20159.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01093, ecapa_loss=0.0001584, whisper_loss=0.0886, over 3830816.28 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:25:00,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.385e+01 2.728e+01 3.101e+01 1.009e+02, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:25:09,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2187410.0, ans=0.2 2024-08-13 15:25:16,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2187510.0, ans=0.125 2024-08-13 15:25:32,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2187610.0, ans=0.0 2024-08-13 15:25:35,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2187610.0, ans=0.1 2024-08-13 15:25:48,010 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 15:25:57,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1400, loss[loss=0.09576, beats_loss=0.009162, ecapa_loss=0.000134, whisper_loss=0.08526, over 15040.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01082, ecapa_loss=0.0001595, whisper_loss=0.08875, over 3789134.74 frames. ], batch size: 55, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:26:14,217 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 15:26:17,117 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-13 15:26:20,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2187910.0, ans=0.0 2024-08-13 15:26:21,264 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 25 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-13 15:26:21,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2187910.0, ans=0.0 2024-08-13 15:26:27,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-08-13 15:26:35,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2188010.0, ans=0.125 2024-08-13 15:26:37,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=15.0 2024-08-13 15:26:38,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2024-08-13 15:26:49,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2188110.0, ans=0.125 2024-08-13 15:26:56,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2188210.0, ans=0.125 2024-08-13 15:27:23,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1450, loss[loss=0.09414, beats_loss=0.0129, ecapa_loss=0.0001365, whisper_loss=0.07988, over 15742.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001599, whisper_loss=0.08991, over 3792865.65 frames. ], batch size: 63, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:27:37,077 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 15:27:40,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.325e+01 2.552e+01 2.880e+01 5.017e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-13 15:27:44,170 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 15:27:58,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2188510.0, ans=0.5 2024-08-13 15:28:11,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2188610.0, ans=15.0 2024-08-13 15:28:12,021 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 15:28:20,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2188610.0, ans=0.125 2024-08-13 15:28:30,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2188710.0, ans=0.0 2024-08-13 15:28:32,884 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 15:28:39,764 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 15:28:43,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1500, loss[loss=0.1055, beats_loss=0.01078, ecapa_loss=0.0001308, whisper_loss=0.09343, over 21696.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01078, ecapa_loss=0.000159, whisper_loss=0.08903, over 3796175.82 frames. ], batch size: 86, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:28:50,658 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 15:29:43,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2189110.0, ans=0.125 2024-08-13 15:29:45,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-08-13 15:29:47,783 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 15:30:01,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2189210.0, ans=0.125 2024-08-13 15:30:04,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1550, loss[loss=0.1066, beats_loss=0.009757, ecapa_loss=0.0001516, whisper_loss=0.09528, over 19915.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001598, whisper_loss=0.08939, over 3783993.99 frames. ], batch size: 75, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:30:12,406 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 31 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 15:30:21,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2189410.0, ans=0.125 2024-08-13 15:30:23,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.248e+01 2.490e+01 2.864e+01 4.046e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-13 15:30:58,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2189610.0, ans=0.1 2024-08-13 15:31:12,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2189710.0, ans=0.0 2024-08-13 15:31:17,396 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 15:31:18,690 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 15:31:23,141 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 15:31:26,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1600, loss[loss=0.06476, beats_loss=0.01452, ecapa_loss=0.0001607, whisper_loss=0.04863, over 16538.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001583, whisper_loss=0.09006, over 3820418.75 frames. ], batch size: 71, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:31:28,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2189810.0, ans=0.1 2024-08-13 15:31:33,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2189810.0, ans=0.0 2024-08-13 15:31:46,399 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-13 15:32:05,621 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 15:32:12,338 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 15:32:16,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-08-13 15:32:30,121 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-13 15:32:35,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-13 15:32:39,201 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 15:32:46,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1650, loss[loss=0.1028, beats_loss=0.00943, ecapa_loss=0.0001822, whisper_loss=0.09153, over 15974.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001595, whisper_loss=0.0904, over 3824842.46 frames. ], batch size: 63, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:32:48,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2190310.0, ans=0.125 2024-08-13 15:32:53,818 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 15:32:58,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2190310.0, ans=0.2 2024-08-13 15:33:03,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.370e+01 2.654e+01 3.120e+01 7.882e+01, threshold=5.308e+01, percent-clipped=3.0 2024-08-13 15:33:17,542 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 15:33:35,446 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.479e+01 2024-08-13 15:33:39,815 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 15:33:44,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2190610.0, ans=0.125 2024-08-13 15:34:03,572 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 15:34:04,929 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1700, loss[loss=0.09377, beats_loss=0.01319, ecapa_loss=0.0001436, whisper_loss=0.07915, over 20506.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001597, whisper_loss=0.09109, over 3779538.01 frames. ], batch size: 82, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:34:13,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2190810.0, ans=0.0 2024-08-13 15:34:27,318 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 15:34:31,900 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 15:34:40,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-13 15:34:43,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2191010.0, ans=0.125 2024-08-13 15:34:50,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2191110.0, ans=0.125 2024-08-13 15:34:52,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-08-13 15:35:04,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2191110.0, ans=0.1 2024-08-13 15:35:21,442 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1750, loss[loss=0.1368, beats_loss=0.006634, ecapa_loss=0.0001908, whisper_loss=0.1283, over 20324.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001594, whisper_loss=0.09021, over 3780362.37 frames. ], batch size: 79, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:35:23,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2191310.0, ans=0.0 2024-08-13 15:35:27,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2191310.0, ans=0.2 2024-08-13 15:35:27,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2024-08-13 15:35:31,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2191310.0, ans=0.0 2024-08-13 15:35:37,240 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.448e+01 2.728e+01 3.089e+01 6.360e+01, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:35:51,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2191510.0, ans=0.1 2024-08-13 15:36:00,423 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 15:36:10,965 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 15:36:16,759 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 15:36:35,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1800, loss[loss=0.09435, beats_loss=0.01153, ecapa_loss=0.0001652, whisper_loss=0.08117, over 15608.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001596, whisper_loss=0.09029, over 3791273.99 frames. ], batch size: 64, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:36:43,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2191810.0, ans=0.125 2024-08-13 15:36:52,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2191910.0, ans=0.125 2024-08-13 15:36:56,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2191910.0, ans=0.125 2024-08-13 15:37:16,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2192010.0, ans=0.125 2024-08-13 15:37:16,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2192010.0, ans=0.125 2024-08-13 15:37:35,378 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 15:37:41,817 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-13 15:37:49,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2192310.0, ans=0.125 2024-08-13 15:37:50,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1850, loss[loss=0.1087, beats_loss=0.009229, ecapa_loss=0.0001765, whisper_loss=0.09775, over 22000.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001601, whisper_loss=0.09013, over 3822951.10 frames. ], batch size: 87, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:37:57,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2192310.0, ans=0.125 2024-08-13 15:38:06,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.626e+01 2.890e+01 6.922e+01, threshold=5.252e+01, percent-clipped=1.0 2024-08-13 15:38:13,225 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-13 15:38:17,328 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 15:39:02,435 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-13 15:39:03,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1900, loss[loss=0.1149, beats_loss=0.012, ecapa_loss=0.0001455, whisper_loss=0.1015, over 18527.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01075, ecapa_loss=0.0001602, whisper_loss=0.08946, over 3811615.56 frames. ], batch size: 73, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:39:22,890 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 15:39:28,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2192910.0, ans=0.125 2024-08-13 15:39:41,310 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 15:39:44,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2193010.0, ans=0.1 2024-08-13 15:39:48,936 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 15:39:57,612 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 27 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-13 15:40:01,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2193110.0, ans=0.125 2024-08-13 15:40:18,155 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 1950, loss[loss=0.1088, beats_loss=0.01009, ecapa_loss=0.0001725, whisper_loss=0.09694, over 14601.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001599, whisper_loss=0.09027, over 3789950.50 frames. ], batch size: 57, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:40:20,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2193310.0, ans=0.125 2024-08-13 15:40:30,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2193310.0, ans=0.1 2024-08-13 15:40:34,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.351e+01 2.582e+01 2.888e+01 8.249e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-13 15:40:34,398 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 40 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 15:40:34,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2193410.0, ans=0.125 2024-08-13 15:40:42,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2193410.0, ans=0.1 2024-08-13 15:40:42,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2193410.0, ans=0.0 2024-08-13 15:40:42,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2193410.0, ans=0.125 2024-08-13 15:40:58,914 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-13 15:41:06,492 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 15:41:23,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2193710.0, ans=0.1 2024-08-13 15:41:33,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2000, loss[loss=0.09886, beats_loss=0.01071, ecapa_loss=0.0001422, whisper_loss=0.08673, over 15566.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001603, whisper_loss=0.0904, over 3816074.71 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:41:56,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2193910.0, ans=0.5 2024-08-13 15:41:57,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-13 15:42:00,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2193910.0, ans=0.125 2024-08-13 15:42:21,376 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 15:42:26,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-13 15:42:27,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2194110.0, ans=0.0 2024-08-13 15:42:35,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2194210.0, ans=0.0 2024-08-13 15:42:38,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2194210.0, ans=0.1 2024-08-13 15:42:38,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2194210.0, ans=0.0 2024-08-13 15:42:45,336 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-13 15:42:47,766 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2050, loss[loss=0.1256, beats_loss=0.008583, ecapa_loss=0.0001473, whisper_loss=0.1155, over 23699.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001609, whisper_loss=0.08984, over 3812597.46 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:42:50,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2194310.0, ans=0.125 2024-08-13 15:43:00,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2194310.0, ans=0.07 2024-08-13 15:43:02,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.12 vs. limit=5.0 2024-08-13 15:43:03,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.358e+01 2.622e+01 3.012e+01 4.492e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-13 15:43:06,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-13 15:43:11,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-08-13 15:43:12,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2194410.0, ans=0.125 2024-08-13 15:43:18,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-13 15:43:28,775 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 15:43:29,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-08-13 15:43:36,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2194610.0, ans=0.0 2024-08-13 15:43:37,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2194610.0, ans=0.2 2024-08-13 15:43:47,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2194710.0, ans=0.125 2024-08-13 15:43:51,694 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 15:44:02,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2100, loss[loss=0.09322, beats_loss=0.009229, ecapa_loss=0.0001881, whisper_loss=0.08211, over 14725.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.0001598, whisper_loss=0.09029, over 3825186.57 frames. ], batch size: 58, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:44:20,856 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 15:44:36,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2195010.0, ans=0.125 2024-08-13 15:44:36,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2195010.0, ans=0.07 2024-08-13 15:44:44,274 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 15:44:46,975 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 15:44:55,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2195110.0, ans=0.1 2024-08-13 15:45:01,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2195210.0, ans=0.125 2024-08-13 15:45:14,449 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2150, loss[loss=0.1079, beats_loss=0.01117, ecapa_loss=0.0001562, whisper_loss=0.09514, over 18989.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001587, whisper_loss=0.09012, over 3814016.16 frames. ], batch size: 73, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:45:19,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2195310.0, ans=0.0 2024-08-13 15:45:30,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.427e+01 2.711e+01 3.071e+01 5.101e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-13 15:45:42,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2195410.0, ans=0.025 2024-08-13 15:45:50,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2195510.0, ans=0.0 2024-08-13 15:46:12,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-13 15:46:29,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2200, loss[loss=0.124, beats_loss=0.007342, ecapa_loss=0.0002113, whisper_loss=0.1146, over 17504.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001584, whisper_loss=0.091, over 3823879.61 frames. ], batch size: 72, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:46:41,192 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 15:46:41,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2195810.0, ans=0.1 2024-08-13 15:46:56,776 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 15:47:01,402 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 15:47:45,303 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2250, loss[loss=0.1173, beats_loss=0.008456, ecapa_loss=0.0001857, whisper_loss=0.107, over 13456.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001595, whisper_loss=0.09151, over 3833141.85 frames. ], batch size: 53, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:47:45,675 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 15:47:55,947 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 15:48:01,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.332e+01 2.611e+01 2.967e+01 5.729e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 15:48:05,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-08-13 15:48:16,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-13 15:48:20,314 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 15:48:27,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-13 15:48:37,381 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 15:48:54,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2196710.0, ans=0.0 2024-08-13 15:49:00,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2300, loss[loss=0.1125, beats_loss=0.01059, ecapa_loss=0.0001462, whisper_loss=0.1004, over 17258.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001608, whisper_loss=0.09154, over 3856335.09 frames. ], batch size: 66, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:49:01,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2196810.0, ans=0.125 2024-08-13 15:49:02,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2196810.0, ans=0.0 2024-08-13 15:49:07,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-08-13 15:49:26,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-13 15:49:28,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2196910.0, ans=0.1 2024-08-13 15:49:29,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2197010.0, ans=0.125 2024-08-13 15:49:42,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-08-13 15:49:47,065 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 15:49:54,436 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 15:50:04,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2197210.0, ans=0.125 2024-08-13 15:50:14,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2350, loss[loss=0.1026, beats_loss=0.00868, ecapa_loss=0.0002058, whisper_loss=0.09187, over 19372.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001625, whisper_loss=0.09196, over 3850779.79 frames. ], batch size: 78, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:50:18,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2197310.0, ans=0.05 2024-08-13 15:50:31,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.477e+01 2.777e+01 3.066e+01 6.337e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-13 15:50:39,098 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 15:50:42,539 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 15:50:45,333 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 15:50:45,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-13 15:50:58,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2197510.0, ans=10.0 2024-08-13 15:50:59,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2197610.0, ans=10.0 2024-08-13 15:51:10,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2197610.0, ans=0.125 2024-08-13 15:51:11,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2197610.0, ans=0.0 2024-08-13 15:51:11,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2197610.0, ans=0.125 2024-08-13 15:51:14,313 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 15:51:30,141 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2400, loss[loss=0.1161, beats_loss=0.01043, ecapa_loss=0.0001342, whisper_loss=0.1044, over 21718.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01083, ecapa_loss=0.0001612, whisper_loss=0.09216, over 3861949.97 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:51:30,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2197810.0, ans=0.2 2024-08-13 15:51:38,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-13 15:51:43,547 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 15:51:59,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2024-08-13 15:52:09,502 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-13 15:52:14,943 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 15:52:29,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2198210.0, ans=0.125 2024-08-13 15:52:41,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-13 15:52:42,502 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2450, loss[loss=0.1096, beats_loss=0.0114, ecapa_loss=0.0001693, whisper_loss=0.09653, over 21931.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01076, ecapa_loss=0.0001598, whisper_loss=0.09249, over 3871111.23 frames. ], batch size: 89, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:52:50,117 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 15:52:54,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2198310.0, ans=0.2 2024-08-13 15:52:58,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.483e+01 2.773e+01 3.111e+01 4.520e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-13 15:53:04,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2198410.0, ans=0.2 2024-08-13 15:53:20,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2198510.0, ans=0.2 2024-08-13 15:53:22,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-08-13 15:53:26,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2024-08-13 15:53:31,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2198610.0, ans=0.0 2024-08-13 15:53:35,106 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 15:53:45,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-08-13 15:53:49,711 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.675e+01 2024-08-13 15:53:53,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2500, loss[loss=0.1215, beats_loss=0.005757, ecapa_loss=0.000187, whisper_loss=0.1139, over 17052.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01071, ecapa_loss=0.0001611, whisper_loss=0.09279, over 3879573.32 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:53:58,217 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 15:54:12,620 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 15:54:34,605 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 15:54:47,283 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 15:54:50,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2199210.0, ans=0.125 2024-08-13 15:54:52,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2199210.0, ans=0.1 2024-08-13 15:54:52,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-08-13 15:54:55,720 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-13 15:55:06,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2550, loss[loss=0.09275, beats_loss=0.01362, ecapa_loss=0.0001389, whisper_loss=0.07775, over 18636.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01068, ecapa_loss=0.0001607, whisper_loss=0.09273, over 3884238.17 frames. ], batch size: 75, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:55:12,294 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 15:55:21,466 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.351e+01 2.676e+01 3.107e+01 6.569e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 15:55:51,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2199610.0, ans=0.0 2024-08-13 15:55:58,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-08-13 15:56:00,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2199610.0, ans=0.05 2024-08-13 15:56:03,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2199710.0, ans=0.5 2024-08-13 15:56:04,396 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 15:56:17,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2600, loss[loss=0.08146, beats_loss=0.01379, ecapa_loss=0.0001076, whisper_loss=0.06659, over 22959.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01071, ecapa_loss=0.0001604, whisper_loss=0.09202, over 3871360.63 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:56:18,004 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 15:56:21,174 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 15:56:21,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2199810.0, ans=0.2 2024-08-13 15:56:25,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2199810.0, ans=0.2 2024-08-13 15:56:25,923 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-08-13 15:56:30,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-13 15:56:33,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2199910.0, ans=0.125 2024-08-13 15:56:59,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2024-08-13 15:57:05,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2200110.0, ans=0.0 2024-08-13 15:57:30,910 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.320e-01 2024-08-13 15:57:33,069 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2650, loss[loss=0.1195, beats_loss=0.009609, ecapa_loss=0.0001526, whisper_loss=0.1084, over 18198.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001611, whisper_loss=0.09178, over 3862549.12 frames. ], batch size: 70, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:57:49,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.280e+01 2.561e+01 2.894e+01 4.049e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-13 15:57:52,674 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 15 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 15:58:27,551 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 15:58:32,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2200710.0, ans=0.2 2024-08-13 15:58:52,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2700, loss[loss=0.08821, beats_loss=0.01124, ecapa_loss=0.0001805, whisper_loss=0.07517, over 19914.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001613, whisper_loss=0.09179, over 3881467.93 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:58:56,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2200810.0, ans=0.125 2024-08-13 15:59:01,976 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 15:59:04,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2200810.0, ans=0.125 2024-08-13 15:59:06,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-13 15:59:17,404 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.360e+05 2024-08-13 15:59:18,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2200910.0, ans=0.0 2024-08-13 15:59:33,461 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 15:59:42,005 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 15:59:48,027 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 16:00:17,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2750, loss[loss=0.09597, beats_loss=0.01138, ecapa_loss=0.0001308, whisper_loss=0.08328, over 16471.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001604, whisper_loss=0.09141, over 3856404.05 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:00:17,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-13 16:00:34,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.357e+01 2.643e+01 3.055e+01 4.900e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-13 16:00:44,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2201410.0, ans=0.125 2024-08-13 16:00:54,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=2201510.0, ans=15.0 2024-08-13 16:00:55,958 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 16:01:30,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2201710.0, ans=0.1 2024-08-13 16:01:33,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2800, loss[loss=0.1129, beats_loss=0.01129, ecapa_loss=0.0001646, whisper_loss=0.09995, over 22577.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001607, whisper_loss=0.0922, over 3879580.40 frames. ], batch size: 89, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:01:45,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=10.0 2024-08-13 16:01:49,339 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 16:02:37,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2202210.0, ans=0.0 2024-08-13 16:02:37,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-13 16:02:39,990 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 16:02:50,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2850, loss[loss=0.09599, beats_loss=0.009919, ecapa_loss=0.0001519, whisper_loss=0.08455, over 17962.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.0001607, whisper_loss=0.09181, over 3866888.20 frames. ], batch size: 72, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:02:52,957 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 16:02:54,658 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 16:03:03,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-13 16:03:08,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.307e+01 2.674e+01 3.004e+01 5.549e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 16:03:12,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2202410.0, ans=0.07 2024-08-13 16:03:48,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2202610.0, ans=0.1 2024-08-13 16:04:06,964 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 16:04:10,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2900, loss[loss=0.1107, beats_loss=0.009261, ecapa_loss=0.0001815, whisper_loss=0.09959, over 15795.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01065, ecapa_loss=0.0001624, whisper_loss=0.09218, over 3875329.10 frames. ], batch size: 64, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:04:11,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2202810.0, ans=0.125 2024-08-13 16:04:17,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2202810.0, ans=0.09899494936611666 2024-08-13 16:04:18,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2202810.0, ans=0.125 2024-08-13 16:04:31,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2024-08-13 16:04:35,318 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 16:04:53,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2203010.0, ans=0.0 2024-08-13 16:04:58,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2203110.0, ans=0.2 2024-08-13 16:04:59,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-13 16:05:08,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2203210.0, ans=0.0 2024-08-13 16:05:20,728 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 16:05:23,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 2950, loss[loss=0.1024, beats_loss=0.0126, ecapa_loss=0.0001599, whisper_loss=0.08825, over 23364.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001624, whisper_loss=0.09212, over 3848957.69 frames. ], batch size: 96, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:05:25,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2203310.0, ans=0.0 2024-08-13 16:05:28,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2203310.0, ans=0.125 2024-08-13 16:05:29,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.99 vs. limit=6.0 2024-08-13 16:05:38,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.341e+01 2.613e+01 3.038e+01 5.265e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-13 16:05:58,678 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 16:06:15,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2203610.0, ans=0.125 2024-08-13 16:06:26,027 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:06:32,159 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3000, loss[loss=0.1215, beats_loss=0.009375, ecapa_loss=0.0001856, whisper_loss=0.1103, over 22235.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01076, ecapa_loss=0.0001614, whisper_loss=0.0925, over 3888066.79 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:06:32,160 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 16:07:12,396 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2474, over 922467.00 frames. 2024-08-13 16:07:30,375 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on SV_voxceleb1: loss=0.004334, beats_loss=0, ecapa_loss=0.0004334, whisper_loss=0, over 939242.00 frames. 2024-08-13 16:09:55,778 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on AT_audioset: loss=0.02373, beats_loss=0.02373, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 16:09:55,784 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 16:10:10,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2024-08-13 16:10:32,384 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 16:10:51,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-13 16:11:04,032 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 10 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 16:11:11,080 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 16:11:27,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3050, loss[loss=0.07851, beats_loss=0.014, ecapa_loss=0.0001154, whisper_loss=0.06336, over 13233.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01076, ecapa_loss=0.0001621, whisper_loss=0.09247, over 3878712.33 frames. ], batch size: 53, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:11:30,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.04 vs. limit=10.0 2024-08-13 16:11:42,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.448e+01 2.786e+01 3.089e+01 5.850e+01, threshold=5.572e+01, percent-clipped=2.0 2024-08-13 16:11:42,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2204410.0, ans=0.0 2024-08-13 16:11:49,448 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 16:11:54,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2204510.0, ans=0.0 2024-08-13 16:12:00,021 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 16:12:09,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2204610.0, ans=0.0 2024-08-13 16:12:11,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2204610.0, ans=0.0 2024-08-13 16:12:28,720 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 16:12:35,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3100, loss[loss=0.1068, beats_loss=0.01175, ecapa_loss=0.0001359, whisper_loss=0.0937, over 21439.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001629, whisper_loss=0.09253, over 3884380.05 frames. ], batch size: 85, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:12:36,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-13 16:12:43,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2204810.0, ans=0.0 2024-08-13 16:12:49,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-13 16:13:04,113 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.344e-02 2024-08-13 16:13:09,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2205010.0, ans=0.1 2024-08-13 16:13:09,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2205010.0, ans=0.125 2024-08-13 16:13:13,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2205010.0, ans=0.125 2024-08-13 16:13:16,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2205110.0, ans=0.125 2024-08-13 16:13:23,114 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 16:13:25,881 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 16:13:29,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2205110.0, ans=0.0 2024-08-13 16:13:40,461 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.123e+01 2024-08-13 16:13:45,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3150, loss[loss=0.117, beats_loss=0.009416, ecapa_loss=0.0001654, whisper_loss=0.1059, over 23581.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001626, whisper_loss=0.09212, over 3882076.57 frames. ], batch size: 94, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:13:56,770 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-13 16:14:00,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.370e+01 2.702e+01 3.002e+01 4.700e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-13 16:14:02,493 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 16:14:03,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-08-13 16:14:07,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-13 16:14:21,383 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 16:14:30,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2205610.0, ans=0.015 2024-08-13 16:14:53,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2024-08-13 16:14:54,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2205710.0, ans=0.0 2024-08-13 16:14:56,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3200, loss[loss=0.1048, beats_loss=0.008058, ecapa_loss=0.0001686, whisper_loss=0.09502, over 14687.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01077, ecapa_loss=0.000164, whisper_loss=0.09243, over 3884218.35 frames. ], batch size: 55, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:14:58,462 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 16:15:17,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2205910.0, ans=0.1 2024-08-13 16:15:18,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2205910.0, ans=0.0 2024-08-13 16:15:20,105 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 16:15:21,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2205910.0, ans=0.125 2024-08-13 16:15:32,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2206010.0, ans=0.025 2024-08-13 16:15:36,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2206010.0, ans=0.1 2024-08-13 16:15:50,404 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 16:15:54,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2206110.0, ans=0.95 2024-08-13 16:16:10,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3250, loss[loss=0.1026, beats_loss=0.01016, ecapa_loss=0.0001592, whisper_loss=0.0908, over 19421.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.0001647, whisper_loss=0.0923, over 3889791.36 frames. ], batch size: 76, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:16:12,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=12.0 2024-08-13 16:16:13,444 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 16:16:25,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.428e+01 2.754e+01 3.023e+01 4.086e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-13 16:16:25,277 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 16:16:28,544 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 16:16:35,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2206410.0, ans=0.2 2024-08-13 16:16:38,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2024-08-13 16:17:09,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2206710.0, ans=0.0 2024-08-13 16:17:13,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-08-13 16:17:14,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2206710.0, ans=0.2 2024-08-13 16:17:22,016 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3300, loss[loss=0.1184, beats_loss=0.009177, ecapa_loss=0.0001183, whisper_loss=0.1081, over 20733.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0108, ecapa_loss=0.0001634, whisper_loss=0.09227, over 3903941.41 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:17:58,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2207010.0, ans=0.0 2024-08-13 16:18:05,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2207110.0, ans=0.125 2024-08-13 16:18:07,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-08-13 16:18:21,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2207210.0, ans=0.05 2024-08-13 16:18:33,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3350, loss[loss=0.08765, beats_loss=0.01228, ecapa_loss=0.0001795, whisper_loss=0.07357, over 17842.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001624, whisper_loss=0.09208, over 3909730.97 frames. ], batch size: 76, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:18:40,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2207310.0, ans=0.2 2024-08-13 16:18:49,651 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.421e+01 2.639e+01 2.919e+01 4.017e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:18:53,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2207410.0, ans=0.125 2024-08-13 16:19:01,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2207410.0, ans=0.125 2024-08-13 16:19:02,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2207510.0, ans=0.025 2024-08-13 16:19:04,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2024-08-13 16:19:10,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-13 16:19:10,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2024-08-13 16:19:16,752 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 16:19:16,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2207610.0, ans=0.5 2024-08-13 16:19:19,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-13 16:19:22,256 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 16:19:24,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2207610.0, ans=0.125 2024-08-13 16:19:29,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2207610.0, ans=0.125 2024-08-13 16:19:35,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2207710.0, ans=0.0 2024-08-13 16:19:45,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3400, loss[loss=0.09548, beats_loss=0.01146, ecapa_loss=0.0001625, whisper_loss=0.08239, over 18232.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001622, whisper_loss=0.09177, over 3867467.42 frames. ], batch size: 72, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:19:50,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2207810.0, ans=0.1 2024-08-13 16:19:51,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2207810.0, ans=0.2 2024-08-13 16:19:52,553 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 16:20:05,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2207910.0, ans=0.125 2024-08-13 16:20:21,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2208010.0, ans=0.0 2024-08-13 16:20:31,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-13 16:20:35,785 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 16:20:38,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2208110.0, ans=0.0 2024-08-13 16:20:52,406 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 16:20:56,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3450, loss[loss=0.1154, beats_loss=0.009655, ecapa_loss=0.0001698, whisper_loss=0.104, over 22608.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001627, whisper_loss=0.09192, over 3903374.61 frames. ], batch size: 89, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:21:11,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.477e+01 2.807e+01 3.322e+01 1.527e+02, threshold=5.614e+01, percent-clipped=5.0 2024-08-13 16:21:21,623 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.469e+01 2024-08-13 16:21:21,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-13 16:21:31,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2208510.0, ans=0.125 2024-08-13 16:21:33,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2208510.0, ans=0.125 2024-08-13 16:21:34,280 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 12 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-13 16:22:01,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2208710.0, ans=0.125 2024-08-13 16:22:06,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3500, loss[loss=0.09807, beats_loss=0.01175, ecapa_loss=0.0001561, whisper_loss=0.08476, over 22482.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001622, whisper_loss=0.09121, over 3884316.94 frames. ], batch size: 90, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:22:07,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2208810.0, ans=0.1 2024-08-13 16:22:09,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-13 16:22:12,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2208810.0, ans=0.125 2024-08-13 16:22:18,268 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 16:22:26,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2208910.0, ans=0.0 2024-08-13 16:22:30,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2208910.0, ans=0.0 2024-08-13 16:22:41,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2209010.0, ans=0.1 2024-08-13 16:22:42,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2209010.0, ans=0.125 2024-08-13 16:22:55,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2209110.0, ans=0.0 2024-08-13 16:23:03,533 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 16:23:12,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2209210.0, ans=0.125 2024-08-13 16:23:13,588 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-13 16:23:13,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2209210.0, ans=0.125 2024-08-13 16:23:20,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3550, loss[loss=0.1185, beats_loss=0.009658, ecapa_loss=0.0001441, whisper_loss=0.1074, over 23269.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001628, whisper_loss=0.09111, over 3868198.32 frames. ], batch size: 89, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:23:27,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2209310.0, ans=0.2 2024-08-13 16:23:29,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2209310.0, ans=0.125 2024-08-13 16:23:32,011 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 16:23:36,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.460e+01 2.758e+01 3.003e+01 5.341e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-13 16:23:51,313 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 16:24:13,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2209610.0, ans=0.0 2024-08-13 16:24:13,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2209610.0, ans=0.0 2024-08-13 16:24:16,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2209610.0, ans=0.0 2024-08-13 16:24:20,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2209710.0, ans=0.1 2024-08-13 16:24:23,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2024-08-13 16:24:25,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-08-13 16:24:26,058 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-13 16:24:34,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3600, loss[loss=0.131, beats_loss=0.009466, ecapa_loss=0.0001231, whisper_loss=0.1203, over 21664.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01074, ecapa_loss=0.0001625, whisper_loss=0.09174, over 3858976.47 frames. ], batch size: 80, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:24:46,651 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 16:25:04,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-13 16:25:13,317 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 16:25:16,598 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 16:25:29,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2210110.0, ans=0.125 2024-08-13 16:25:46,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2210310.0, ans=0.1 2024-08-13 16:25:46,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3650, loss[loss=0.1153, beats_loss=0.01137, ecapa_loss=0.0001541, whisper_loss=0.1024, over 21927.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001635, whisper_loss=0.09126, over 3838972.04 frames. ], batch size: 88, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:25:54,987 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 16:26:01,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2210410.0, ans=0.125 2024-08-13 16:26:02,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.319e+01 2.686e+01 3.119e+01 4.845e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-13 16:26:05,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2210410.0, ans=0.125 2024-08-13 16:26:07,069 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 16:26:08,255 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 16:26:31,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2210610.0, ans=0.125 2024-08-13 16:26:42,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2210710.0, ans=0.0 2024-08-13 16:26:52,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2210710.0, ans=0.125 2024-08-13 16:26:56,170 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3700, loss[loss=0.1262, beats_loss=0.009131, ecapa_loss=0.0001945, whisper_loss=0.1151, over 16551.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01078, ecapa_loss=0.0001636, whisper_loss=0.09195, over 3837723.21 frames. ], batch size: 68, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:27:14,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-13 16:27:21,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-13 16:27:37,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2211110.0, ans=0.0 2024-08-13 16:27:44,122 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 29 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 16:28:02,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2211310.0, ans=0.2 2024-08-13 16:28:03,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3750, loss[loss=0.1165, beats_loss=0.01202, ecapa_loss=0.0001493, whisper_loss=0.103, over 15857.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001627, whisper_loss=0.0918, over 3853255.40 frames. ], batch size: 61, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:28:09,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-08-13 16:28:17,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.410e+01 2.677e+01 3.009e+01 6.113e+01, threshold=5.354e+01, percent-clipped=1.0 2024-08-13 16:28:18,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2211410.0, ans=0.125 2024-08-13 16:28:29,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2211510.0, ans=0.0 2024-08-13 16:28:45,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2211610.0, ans=0.125 2024-08-13 16:29:08,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3800, loss[loss=0.08881, beats_loss=0.01203, ecapa_loss=0.0001549, whisper_loss=0.07523, over 21847.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0108, ecapa_loss=0.000164, whisper_loss=0.09151, over 3874463.51 frames. ], batch size: 90, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:29:31,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2211910.0, ans=0.125 2024-08-13 16:29:33,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2212010.0, ans=0.2 2024-08-13 16:29:55,202 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 16:29:55,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2212110.0, ans=0.1 2024-08-13 16:30:13,173 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3850, loss[loss=0.08971, beats_loss=0.01271, ecapa_loss=0.0001504, whisper_loss=0.07549, over 22096.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001643, whisper_loss=0.09106, over 3845572.78 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:30:27,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.456e+01 2.760e+01 3.181e+01 8.437e+01, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 16:30:27,687 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 16:30:34,199 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 16:30:38,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2212510.0, ans=0.125 2024-08-13 16:30:40,665 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 16:31:18,614 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3900, loss[loss=0.08884, beats_loss=0.01207, ecapa_loss=0.0001544, whisper_loss=0.07522, over 16584.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01087, ecapa_loss=0.0001642, whisper_loss=0.09062, over 3841108.38 frames. ], batch size: 70, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:31:29,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2212810.0, ans=0.125 2024-08-13 16:31:30,766 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 16:31:33,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2212910.0, ans=0.09899494936611666 2024-08-13 16:31:41,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2212910.0, ans=0.0 2024-08-13 16:31:42,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2212910.0, ans=0.1 2024-08-13 16:31:54,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2213010.0, ans=0.125 2024-08-13 16:31:55,423 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 16:32:23,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 3950, loss[loss=0.09899, beats_loss=0.01133, ecapa_loss=0.0001346, whisper_loss=0.08631, over 24346.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.000164, whisper_loss=0.09131, over 3854832.39 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:32:26,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2213310.0, ans=0.125 2024-08-13 16:32:29,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2213310.0, ans=0.04949747468305833 2024-08-13 16:32:37,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.503e+01 2.824e+01 3.168e+01 4.630e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 16:32:56,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2213510.0, ans=0.125 2024-08-13 16:32:57,167 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 16:33:05,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2213610.0, ans=0.0 2024-08-13 16:33:14,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2024-08-13 16:33:18,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2213710.0, ans=0.125 2024-08-13 16:33:21,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2213710.0, ans=0.125 2024-08-13 16:33:28,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4000, loss[loss=0.104, beats_loss=0.01102, ecapa_loss=0.0002017, whisper_loss=0.09093, over 18031.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01078, ecapa_loss=0.0001647, whisper_loss=0.09194, over 3862715.82 frames. ], batch size: 76, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:33:56,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.24 vs. limit=15.0 2024-08-13 16:33:57,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2214010.0, ans=0.0 2024-08-13 16:34:03,019 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.213e-02 2024-08-13 16:34:29,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2214210.0, ans=0.125 2024-08-13 16:34:33,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4050, loss[loss=0.1221, beats_loss=0.009455, ecapa_loss=0.0001482, whisper_loss=0.1111, over 23076.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0107, ecapa_loss=0.0001656, whisper_loss=0.09221, over 3874884.75 frames. ], batch size: 87, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:34:41,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-13 16:34:46,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2214410.0, ans=0.125 2024-08-13 16:34:48,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.510e+01 2.777e+01 3.045e+01 5.508e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-13 16:34:52,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-13 16:34:53,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-08-13 16:34:55,082 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 16:35:10,525 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 18 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-13 16:35:22,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2214610.0, ans=0.1 2024-08-13 16:35:36,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2214710.0, ans=0.125 2024-08-13 16:35:38,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4100, loss[loss=0.1172, beats_loss=0.0116, ecapa_loss=0.0001593, whisper_loss=0.104, over 22371.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001647, whisper_loss=0.09165, over 3894740.27 frames. ], batch size: 90, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:35:40,295 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 16:35:48,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2214810.0, ans=0.125 2024-08-13 16:35:52,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2214910.0, ans=0.2 2024-08-13 16:35:53,275 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 16:35:59,838 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 16:36:00,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-13 16:36:04,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-13 16:36:16,840 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-13 16:36:18,004 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 16:36:27,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2215110.0, ans=0.1 2024-08-13 16:36:30,001 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 16:36:32,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-13 16:36:36,287 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 16:36:43,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4150, loss[loss=0.1073, beats_loss=0.01087, ecapa_loss=0.0001656, whisper_loss=0.09477, over 23288.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01081, ecapa_loss=0.0001635, whisper_loss=0.09212, over 3910853.41 frames. ], batch size: 94, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:36:44,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-08-13 16:36:49,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2215310.0, ans=0.125 2024-08-13 16:36:55,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2215410.0, ans=0.1 2024-08-13 16:36:57,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.337e+01 2.557e+01 2.975e+01 8.257e+01, threshold=5.114e+01, percent-clipped=2.0 2024-08-13 16:37:09,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-08-13 16:37:11,273 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 16:37:15,118 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 16:37:15,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2215510.0, ans=0.0 2024-08-13 16:37:38,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2215710.0, ans=0.125 2024-08-13 16:37:41,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2215710.0, ans=0.1 2024-08-13 16:37:48,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-13 16:37:48,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4200, loss[loss=0.1001, beats_loss=0.01046, ecapa_loss=0.0001602, whisper_loss=0.08799, over 19724.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001625, whisper_loss=0.09238, over 3914529.42 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:37:55,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2215810.0, ans=0.125 2024-08-13 16:38:07,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2215910.0, ans=0.0 2024-08-13 16:38:11,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2215910.0, ans=0.125 2024-08-13 16:38:36,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2216110.0, ans=0.125 2024-08-13 16:38:38,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2216110.0, ans=0.2 2024-08-13 16:38:39,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2216110.0, ans=0.125 2024-08-13 16:38:52,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2216210.0, ans=0.125 2024-08-13 16:38:56,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4250, loss[loss=0.1029, beats_loss=0.01112, ecapa_loss=0.0001528, whisper_loss=0.0903, over 21826.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001618, whisper_loss=0.09249, over 3910843.23 frames. ], batch size: 85, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:38:58,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2216310.0, ans=0.125 2024-08-13 16:39:07,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2216310.0, ans=0.125 2024-08-13 16:39:12,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.639e+01 2.854e+01 4.176e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:39:14,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2216410.0, ans=0.2 2024-08-13 16:39:20,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2216410.0, ans=0.2 2024-08-13 16:39:26,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2216510.0, ans=0.125 2024-08-13 16:39:32,153 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 16:39:45,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2216610.0, ans=0.1 2024-08-13 16:39:48,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2216610.0, ans=0.1 2024-08-13 16:39:48,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2216610.0, ans=0.0 2024-08-13 16:39:51,483 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 16:40:03,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2216710.0, ans=0.2 2024-08-13 16:40:08,548 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 16:40:11,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4300, loss[loss=0.09808, beats_loss=0.009254, ecapa_loss=0.0001978, whisper_loss=0.08684, over 19821.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001627, whisper_loss=0.09178, over 3867689.56 frames. ], batch size: 86, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:40:22,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2216810.0, ans=0.125 2024-08-13 16:40:51,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2217010.0, ans=0.1 2024-08-13 16:40:55,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2217010.0, ans=0.2 2024-08-13 16:40:58,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.02 vs. limit=10.0 2024-08-13 16:41:06,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2217110.0, ans=0.025 2024-08-13 16:41:26,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2217310.0, ans=0.0 2024-08-13 16:41:27,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4350, loss[loss=0.1002, beats_loss=0.01177, ecapa_loss=0.0001702, whisper_loss=0.08669, over 21410.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001632, whisper_loss=0.09146, over 3850584.77 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:41:40,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-13 16:41:40,810 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 16:41:41,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.464e+01 2.794e+01 3.090e+01 4.694e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-13 16:41:44,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-13 16:41:44,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2024-08-13 16:41:45,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2217410.0, ans=0.125 2024-08-13 16:41:47,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2217410.0, ans=0.125 2024-08-13 16:41:50,115 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-13 16:41:54,854 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 16:41:55,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2217510.0, ans=0.1 2024-08-13 16:42:01,505 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 16:42:02,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-13 16:42:08,607 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.057e+01 2024-08-13 16:42:09,594 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 16:42:32,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4400, loss[loss=0.1108, beats_loss=0.01139, ecapa_loss=0.0001398, whisper_loss=0.09802, over 17680.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01081, ecapa_loss=0.0001617, whisper_loss=0.09149, over 3849280.80 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:42:34,606 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.508e+00 2024-08-13 16:42:38,190 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 16:42:50,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-08-13 16:43:07,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2218010.0, ans=0.0 2024-08-13 16:43:11,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2218110.0, ans=0.125 2024-08-13 16:43:26,835 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 16:43:37,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4450, loss[loss=0.1238, beats_loss=0.008922, ecapa_loss=0.0001758, whisper_loss=0.1131, over 19114.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001603, whisper_loss=0.09086, over 3859434.74 frames. ], batch size: 75, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:43:41,287 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 16:43:44,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2218310.0, ans=0.125 2024-08-13 16:43:49,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2218410.0, ans=0.125 2024-08-13 16:43:51,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.377e+01 2.551e+01 3.070e+01 5.212e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-13 16:43:52,135 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 16:43:55,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2218410.0, ans=0.0 2024-08-13 16:43:57,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2218410.0, ans=0.09899494936611666 2024-08-13 16:44:34,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2218710.0, ans=0.0 2024-08-13 16:44:38,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2218710.0, ans=0.035 2024-08-13 16:44:41,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4500, loss[loss=0.0906, beats_loss=0.01314, ecapa_loss=0.0001584, whisper_loss=0.07587, over 16467.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01095, ecapa_loss=0.0001609, whisper_loss=0.09014, over 3869096.34 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:44:42,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2218810.0, ans=0.0 2024-08-13 16:44:42,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2218810.0, ans=0.125 2024-08-13 16:44:50,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2218810.0, ans=0.1 2024-08-13 16:45:08,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2219010.0, ans=0.05 2024-08-13 16:45:12,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2219010.0, ans=0.125 2024-08-13 16:45:12,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2219010.0, ans=0.0 2024-08-13 16:45:17,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2219010.0, ans=0.09899494936611666 2024-08-13 16:45:39,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-13 16:45:41,170 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 16:45:41,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2219210.0, ans=0.125 2024-08-13 16:45:48,058 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 16:45:53,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=12.0 2024-08-13 16:45:53,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-08-13 16:45:55,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4550, loss[loss=0.08061, beats_loss=0.01275, ecapa_loss=0.0001432, whisper_loss=0.06643, over 17383.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001626, whisper_loss=0.09129, over 3897493.12 frames. ], batch size: 69, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:45:58,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2219310.0, ans=0.125 2024-08-13 16:46:12,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.447e+01 2.788e+01 3.187e+01 5.560e+01, threshold=5.575e+01, percent-clipped=2.0 2024-08-13 16:46:14,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-13 16:46:34,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2219510.0, ans=0.2 2024-08-13 16:46:44,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2219510.0, ans=0.0 2024-08-13 16:46:44,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-13 16:46:57,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2219610.0, ans=0.125 2024-08-13 16:47:12,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2219710.0, ans=0.125 2024-08-13 16:47:31,875 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4600, loss[loss=0.09178, beats_loss=0.01001, ecapa_loss=0.000187, whisper_loss=0.0799, over 17471.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.000164, whisper_loss=0.09126, over 3905974.98 frames. ], batch size: 72, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:47:34,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-13 16:47:38,002 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 16:47:41,785 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 16:47:48,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2219810.0, ans=0.0 2024-08-13 16:47:48,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-13 16:48:20,393 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 16:48:40,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=15.0 2024-08-13 16:48:48,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2220110.0, ans=0.0 2024-08-13 16:48:50,509 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 16:49:03,070 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 16:49:24,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4650, loss[loss=0.1111, beats_loss=0.01018, ecapa_loss=0.000123, whisper_loss=0.09965, over 17636.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001632, whisper_loss=0.09123, over 3877136.78 frames. ], batch size: 66, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:49:45,938 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-13 16:49:46,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2024-08-13 16:49:50,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.519e+01 2.721e+01 2.978e+01 4.976e+01, threshold=5.443e+01, percent-clipped=0.0 2024-08-13 16:49:50,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2220410.0, ans=0.5 2024-08-13 16:49:50,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2220410.0, ans=0.1 2024-08-13 16:50:04,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2220410.0, ans=0.0 2024-08-13 16:50:08,250 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 16:50:12,395 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 16:50:22,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-13 16:50:37,315 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 16:50:40,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2220610.0, ans=0.125 2024-08-13 16:50:56,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2220710.0, ans=0.125 2024-08-13 16:51:16,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2220710.0, ans=0.1 2024-08-13 16:51:19,521 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4700, loss[loss=0.1046, beats_loss=0.01153, ecapa_loss=0.0001355, whisper_loss=0.09175, over 21189.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001633, whisper_loss=0.09135, over 3875953.03 frames. ], batch size: 81, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:51:21,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2220810.0, ans=0.125 2024-08-13 16:51:41,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2220910.0, ans=0.0 2024-08-13 16:51:50,487 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 16:52:00,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.07 vs. limit=6.0 2024-08-13 16:52:24,308 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 16:52:38,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-13 16:52:53,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=12.0 2024-08-13 16:52:57,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2221210.0, ans=0.125 2024-08-13 16:53:01,957 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4750, loss[loss=0.1005, beats_loss=0.009581, ecapa_loss=0.0002334, whisper_loss=0.08859, over 15343.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001627, whisper_loss=0.09112, over 3859071.59 frames. ], batch size: 66, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:53:17,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.389e+01 2.725e+01 3.065e+01 4.342e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 16:53:26,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2221410.0, ans=0.0 2024-08-13 16:53:44,038 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 16:53:51,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.36 vs. limit=15.0 2024-08-13 16:54:07,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2221710.0, ans=0.05 2024-08-13 16:54:14,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4800, loss[loss=0.09166, beats_loss=0.009742, ecapa_loss=0.0001868, whisper_loss=0.08005, over 18706.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001633, whisper_loss=0.091, over 3846276.77 frames. ], batch size: 75, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:54:19,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2221810.0, ans=0.1 2024-08-13 16:54:23,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=12.0 2024-08-13 16:54:37,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2024-08-13 16:54:39,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2221910.0, ans=0.125 2024-08-13 16:54:52,749 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 16:54:56,698 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 16:55:18,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2222210.0, ans=0.125 2024-08-13 16:55:20,786 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-13 16:55:23,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2222210.0, ans=0.0 2024-08-13 16:55:35,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4850, loss[loss=0.09774, beats_loss=0.01212, ecapa_loss=0.0001619, whisper_loss=0.084, over 22647.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001634, whisper_loss=0.09109, over 3884877.44 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:55:37,157 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 16:55:52,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.475e+01 2.681e+01 3.157e+01 5.324e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 16:55:59,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2222410.0, ans=0.125 2024-08-13 16:55:59,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2222410.0, ans=0.125 2024-08-13 16:56:11,762 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 16:56:23,781 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 16:56:42,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2222710.0, ans=0.0 2024-08-13 16:56:51,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2222810.0, ans=15.0 2024-08-13 16:56:51,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4900, loss[loss=0.09991, beats_loss=0.01022, ecapa_loss=0.0001755, whisper_loss=0.08794, over 22990.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001624, whisper_loss=0.09116, over 3877554.57 frames. ], batch size: 92, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:57:12,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2222910.0, ans=0.0 2024-08-13 16:58:08,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2223310.0, ans=0.125 2024-08-13 16:58:08,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 4950, loss[loss=0.1096, beats_loss=0.00945, ecapa_loss=0.0001532, whisper_loss=0.09864, over 18767.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001633, whisper_loss=0.09118, over 3864167.42 frames. ], batch size: 73, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:58:10,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2223310.0, ans=0.125 2024-08-13 16:58:13,214 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 16:58:15,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.12 vs. limit=10.0 2024-08-13 16:58:26,187 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.318e+01 2.496e+01 2.819e+01 1.833e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-13 16:58:27,856 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 16:59:05,091 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 16:59:06,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2223610.0, ans=0.0 2024-08-13 16:59:15,695 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-13 16:59:18,790 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 16:59:25,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2223810.0, ans=0.5 2024-08-13 16:59:26,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5000, loss[loss=0.1221, beats_loss=0.01004, ecapa_loss=0.0001683, whisper_loss=0.1104, over 23886.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001649, whisper_loss=0.09091, over 3866751.05 frames. ], batch size: 92, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:59:30,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-13 16:59:37,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2223810.0, ans=0.0 2024-08-13 16:59:42,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2223910.0, ans=0.0 2024-08-13 17:00:38,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2224210.0, ans=0.125 2024-08-13 17:00:41,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5050, loss[loss=0.1081, beats_loss=0.01383, ecapa_loss=0.0001418, whisper_loss=0.09281, over 16993.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001638, whisper_loss=0.09113, over 3874430.04 frames. ], batch size: 66, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:00:42,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2224310.0, ans=0.125 2024-08-13 17:01:00,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.341e+01 2.653e+01 3.152e+01 4.271e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-13 17:01:02,559 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 17:01:22,627 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 17:01:39,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-13 17:01:48,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2224710.0, ans=0.0 2024-08-13 17:01:57,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5100, loss[loss=0.116, beats_loss=0.01037, ecapa_loss=0.0001523, whisper_loss=0.1041, over 20038.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.0001629, whisper_loss=0.09165, over 3867204.56 frames. ], batch size: 79, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:02:06,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2024-08-13 17:02:15,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2224910.0, ans=0.0 2024-08-13 17:02:17,432 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 17:02:29,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2225010.0, ans=0.125 2024-08-13 17:02:30,609 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 17:02:31,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2225010.0, ans=0.0 2024-08-13 17:02:44,840 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 17:02:46,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2225110.0, ans=0.125 2024-08-13 17:02:51,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2225110.0, ans=0.2 2024-08-13 17:03:01,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2225210.0, ans=0.125 2024-08-13 17:03:04,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2225210.0, ans=0.125 2024-08-13 17:03:09,147 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 17:03:14,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5150, loss[loss=0.1185, beats_loss=0.008773, ecapa_loss=0.0001784, whisper_loss=0.1079, over 15653.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001623, whisper_loss=0.09212, over 3878250.63 frames. ], batch size: 63, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:03:14,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2024-08-13 17:03:29,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.384e+01 2.654e+01 2.972e+01 6.587e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-13 17:03:37,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-13 17:03:44,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2225510.0, ans=0.07 2024-08-13 17:03:44,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2024-08-13 17:03:53,910 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 17:04:00,369 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 17:04:03,187 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 17:04:26,336 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 17:04:28,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5200, loss[loss=0.1041, beats_loss=0.01211, ecapa_loss=0.000148, whisper_loss=0.09049, over 22702.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001623, whisper_loss=0.09122, over 3876447.77 frames. ], batch size: 91, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:04:29,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2024-08-13 17:04:32,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2225810.0, ans=0.0 2024-08-13 17:04:49,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2225910.0, ans=0.0 2024-08-13 17:04:58,283 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 21 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-13 17:05:12,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2226110.0, ans=0.2 2024-08-13 17:05:41,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5250, loss[loss=0.09854, beats_loss=0.01011, ecapa_loss=0.0001765, whisper_loss=0.08667, over 13739.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001616, whisper_loss=0.09139, over 3885563.47 frames. ], batch size: 54, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:05:46,360 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 17:05:58,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.417e+01 2.576e+01 2.914e+01 4.655e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 17:06:28,562 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 17:06:34,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2226610.0, ans=0.125 2024-08-13 17:06:35,837 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 17:06:59,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5300, loss[loss=0.124, beats_loss=0.008288, ecapa_loss=0.0001472, whisper_loss=0.1142, over 23951.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001628, whisper_loss=0.09166, over 3847855.79 frames. ], batch size: 92, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:07:08,613 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 17:07:12,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2226810.0, ans=0.125 2024-08-13 17:07:13,403 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 17:07:15,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-13 17:07:15,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-13 17:07:20,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2024-08-13 17:07:30,380 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 17:07:34,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2227010.0, ans=0.0 2024-08-13 17:07:46,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2227110.0, ans=0.125 2024-08-13 17:07:47,817 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 17:07:51,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2227110.0, ans=0.1 2024-08-13 17:07:52,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-13 17:08:04,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2227210.0, ans=0.125 2024-08-13 17:08:07,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2227210.0, ans=0.125 2024-08-13 17:08:16,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5350, loss[loss=0.09719, beats_loss=0.01153, ecapa_loss=0.0001402, whisper_loss=0.08427, over 22441.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.0001624, whisper_loss=0.09175, over 3827210.33 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:08:34,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.323e+01 2.552e+01 2.858e+01 4.460e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-13 17:08:38,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2227410.0, ans=0.1 2024-08-13 17:09:01,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2227510.0, ans=6.0 2024-08-13 17:09:03,027 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 17:09:16,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2227610.0, ans=0.2 2024-08-13 17:09:22,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2024-08-13 17:09:29,699 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 17:09:35,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5400, loss[loss=0.1191, beats_loss=0.01217, ecapa_loss=0.0001148, whisper_loss=0.1058, over 25401.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001627, whisper_loss=0.09162, over 3845422.41 frames. ], batch size: 94, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:09:36,683 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 17:09:40,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2227810.0, ans=0.0 2024-08-13 17:10:01,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2227910.0, ans=0.125 2024-08-13 17:10:34,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2024-08-13 17:10:39,335 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 17:10:50,444 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 17:10:53,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5450, loss[loss=0.1189, beats_loss=0.008862, ecapa_loss=0.0001877, whisper_loss=0.1082, over 22753.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001626, whisper_loss=0.09184, over 3873638.06 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:11:01,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2228310.0, ans=0.0 2024-08-13 17:11:11,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.403e+01 2.600e+01 2.908e+01 1.736e+02, threshold=5.201e+01, percent-clipped=2.0 2024-08-13 17:11:13,619 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 17:11:35,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2228510.0, ans=0.1 2024-08-13 17:11:40,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2228610.0, ans=0.125 2024-08-13 17:11:43,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-13 17:11:45,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-13 17:12:03,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2228710.0, ans=0.1 2024-08-13 17:12:12,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5500, loss[loss=0.08586, beats_loss=0.01121, ecapa_loss=0.0002026, whisper_loss=0.07262, over 14396.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01074, ecapa_loss=0.0001625, whisper_loss=0.09267, over 3898645.37 frames. ], batch size: 61, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:12:27,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2228910.0, ans=0.5 2024-08-13 17:12:46,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-13 17:12:52,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-13 17:12:55,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2229010.0, ans=0.125 2024-08-13 17:12:58,079 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 17:13:27,354 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-13 17:13:29,460 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 17:13:30,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5550, loss[loss=0.1093, beats_loss=0.0124, ecapa_loss=0.0001799, whisper_loss=0.09514, over 22053.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.000162, whisper_loss=0.09199, over 3898633.42 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:13:36,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2229310.0, ans=0.125 2024-08-13 17:13:51,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.399e+01 2.709e+01 2.923e+01 5.241e+01, threshold=5.419e+01, percent-clipped=1.0 2024-08-13 17:14:09,287 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 17:14:15,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2229510.0, ans=0.125 2024-08-13 17:14:18,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2229510.0, ans=0.125 2024-08-13 17:14:33,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.73 vs. limit=5.0 2024-08-13 17:14:47,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-13 17:14:48,231 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-13 17:14:51,234 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 17:14:53,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-13 17:14:54,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5600, loss[loss=0.08624, beats_loss=0.0108, ecapa_loss=0.0001643, whisper_loss=0.0738, over 17232.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001626, whisper_loss=0.09181, over 3914473.73 frames. ], batch size: 71, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:14:59,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2229810.0, ans=0.125 2024-08-13 17:15:09,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=12.0 2024-08-13 17:15:14,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-13 17:15:22,830 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 17:15:25,813 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 17:15:28,836 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 17:15:39,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2230110.0, ans=0.125 2024-08-13 17:15:43,905 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 17:15:53,266 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 17:15:56,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2230210.0, ans=0.125 2024-08-13 17:15:59,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2230210.0, ans=0.125 2024-08-13 17:16:08,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2230210.0, ans=0.05 2024-08-13 17:16:11,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5650, loss[loss=0.1142, beats_loss=0.009689, ecapa_loss=0.0001929, whisper_loss=0.1026, over 21800.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001627, whisper_loss=0.09117, over 3908173.92 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:16:29,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.435e+01 2.742e+01 3.035e+01 1.015e+02, threshold=5.483e+01, percent-clipped=1.0 2024-08-13 17:16:37,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2230410.0, ans=0.0 2024-08-13 17:16:47,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2024-08-13 17:17:12,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2230710.0, ans=0.125 2024-08-13 17:17:12,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2024-08-13 17:17:29,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5700, loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.000161, whisper_loss=0.09231, over 22910.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001626, whisper_loss=0.09178, over 3921456.47 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:17:39,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2230810.0, ans=0.2 2024-08-13 17:17:51,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2230910.0, ans=0.0 2024-08-13 17:18:21,017 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 17:18:31,105 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 17:18:47,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=2231310.0, ans=0.2 2024-08-13 17:18:48,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5750, loss[loss=0.08294, beats_loss=0.01401, ecapa_loss=0.0001735, whisper_loss=0.06719, over 21477.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001632, whisper_loss=0.09135, over 3892296.15 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:19:05,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2231410.0, ans=0.1 2024-08-13 17:19:07,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.349e+01 2.635e+01 2.966e+01 1.104e+02, threshold=5.269e+01, percent-clipped=1.0 2024-08-13 17:19:25,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2231510.0, ans=0.125 2024-08-13 17:19:30,010 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:19:34,656 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 17:19:36,177 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-13 17:19:40,580 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 17:19:46,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2231610.0, ans=0.07 2024-08-13 17:20:05,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5800, loss[loss=0.1165, beats_loss=0.009779, ecapa_loss=0.0001417, whisper_loss=0.1053, over 24894.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001636, whisper_loss=0.0915, over 3887007.33 frames. ], batch size: 92, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:20:08,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2231810.0, ans=0.1 2024-08-13 17:20:19,637 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-13 17:20:24,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-08-13 17:20:52,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2232110.0, ans=0.125 2024-08-13 17:20:54,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2232110.0, ans=0.1 2024-08-13 17:21:09,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-08-13 17:21:10,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2232210.0, ans=0.0 2024-08-13 17:21:17,410 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 17:21:20,022 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5850, loss[loss=0.09593, beats_loss=0.009783, ecapa_loss=0.0001734, whisper_loss=0.08441, over 21552.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001624, whisper_loss=0.09119, over 3882203.29 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:21:20,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2232310.0, ans=0.125 2024-08-13 17:21:29,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2232310.0, ans=0.125 2024-08-13 17:21:31,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2232310.0, ans=0.07 2024-08-13 17:21:34,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2232410.0, ans=0.125 2024-08-13 17:21:37,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.300e+01 2.602e+01 2.847e+01 5.570e+01, threshold=5.204e+01, percent-clipped=1.0 2024-08-13 17:21:38,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2232410.0, ans=0.125 2024-08-13 17:21:46,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2232410.0, ans=0.2 2024-08-13 17:21:51,411 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 17:22:09,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2232610.0, ans=0.0 2024-08-13 17:22:13,380 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 17:22:32,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2232810.0, ans=0.5 2024-08-13 17:22:33,012 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5900, loss[loss=0.1294, beats_loss=0.008284, ecapa_loss=0.0001974, whisper_loss=0.1191, over 21507.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001638, whisper_loss=0.09078, over 3838871.24 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:22:46,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-08-13 17:22:56,298 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 17:23:04,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-13 17:23:29,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-08-13 17:23:38,932 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 17:23:43,625 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 17:23:44,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 5950, loss[loss=0.09894, beats_loss=0.01094, ecapa_loss=0.0001747, whisper_loss=0.08626, over 21451.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001639, whisper_loss=0.09088, over 3827298.25 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:24:01,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.407e+01 2.699e+01 3.092e+01 2.272e+02, threshold=5.398e+01, percent-clipped=4.0 2024-08-13 17:24:06,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-08-13 17:24:10,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2233410.0, ans=0.1 2024-08-13 17:24:11,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2233510.0, ans=0.1 2024-08-13 17:24:16,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2233510.0, ans=0.0 2024-08-13 17:24:23,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.22 vs. limit=10.0 2024-08-13 17:24:48,616 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 38 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 17:24:50,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2233710.0, ans=0.1 2024-08-13 17:24:57,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6000, loss[loss=0.09543, beats_loss=0.01324, ecapa_loss=0.0001441, whisper_loss=0.08075, over 22437.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01093, ecapa_loss=0.0001622, whisper_loss=0.09012, over 3863680.19 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:24:57,909 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 17:25:32,933 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005624, whisper_loss=0.2475, over 922467.00 frames. 2024-08-13 17:25:51,973 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on SV_voxceleb1: loss=0.004549, beats_loss=0, ecapa_loss=0.0004549, whisper_loss=0, over 939242.00 frames. 2024-08-13 17:27:33,683 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on AT_audioset: loss=0.02369, beats_loss=0.02369, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 17:27:33,687 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 17:27:55,117 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 17:28:05,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2234010.0, ans=0.07 2024-08-13 17:28:13,344 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.317e+00 2024-08-13 17:28:15,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2234010.0, ans=0.125 2024-08-13 17:28:28,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-13 17:28:33,519 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 17:28:47,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6050, loss[loss=0.09642, beats_loss=0.01223, ecapa_loss=0.0001211, whisper_loss=0.08298, over 16637.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001617, whisper_loss=0.09104, over 3869066.68 frames. ], batch size: 63, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:28:52,090 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 29 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 17:28:52,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-13 17:29:04,770 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.253e+00 2024-08-13 17:29:05,700 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 17:29:06,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.365e+01 2.582e+01 2.840e+01 3.927e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-13 17:29:29,854 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 17:29:40,160 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 17:29:47,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2234710.0, ans=0.025 2024-08-13 17:30:03,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6100, loss[loss=0.1064, beats_loss=0.0109, ecapa_loss=0.0001611, whisper_loss=0.09389, over 17747.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001632, whisper_loss=0.09089, over 3877008.40 frames. ], batch size: 72, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:30:21,709 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 17:30:25,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=8.0 2024-08-13 17:30:26,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=2234910.0, ans=0.1 2024-08-13 17:30:27,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2234910.0, ans=0.125 2024-08-13 17:30:33,103 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 17:30:38,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2024-08-13 17:30:48,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2235110.0, ans=0.125 2024-08-13 17:31:05,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2235210.0, ans=0.125 2024-08-13 17:31:15,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6150, loss[loss=0.1086, beats_loss=0.01073, ecapa_loss=0.0001639, whisper_loss=0.09625, over 21741.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001635, whisper_loss=0.0906, over 3845155.28 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:31:19,141 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-13 17:31:20,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2235310.0, ans=0.125 2024-08-13 17:31:33,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.323e+01 2.638e+01 2.979e+01 5.632e+01, threshold=5.276e+01, percent-clipped=1.0 2024-08-13 17:31:43,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2235410.0, ans=0.025 2024-08-13 17:32:12,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2235610.0, ans=0.1 2024-08-13 17:32:29,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6200, loss[loss=0.1105, beats_loss=0.009498, ecapa_loss=0.0001441, whisper_loss=0.09959, over 18953.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001624, whisper_loss=0.09107, over 3861715.82 frames. ], batch size: 73, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:32:45,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2235910.0, ans=0.0 2024-08-13 17:32:59,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=15.0 2024-08-13 17:33:07,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2236010.0, ans=0.125 2024-08-13 17:33:15,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2236110.0, ans=0.125 2024-08-13 17:33:18,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-08-13 17:33:24,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-13 17:33:43,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2236210.0, ans=0.125 2024-08-13 17:33:48,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6250, loss[loss=0.1116, beats_loss=0.01098, ecapa_loss=0.000164, whisper_loss=0.09897, over 14619.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001624, whisper_loss=0.09087, over 3858593.31 frames. ], batch size: 58, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:34:03,406 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 17:34:05,988 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.429e+01 2.660e+01 2.868e+01 5.842e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 17:34:09,263 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 17:34:20,685 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 17:34:46,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2236610.0, ans=0.2 2024-08-13 17:35:04,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6300, loss[loss=0.09876, beats_loss=0.01411, ecapa_loss=0.0001534, whisper_loss=0.08312, over 22560.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001631, whisper_loss=0.09069, over 3849412.96 frames. ], batch size: 94, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:35:05,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2236810.0, ans=0.125 2024-08-13 17:35:27,664 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 17:35:40,529 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-13 17:35:47,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2237010.0, ans=0.04949747468305833 2024-08-13 17:35:56,585 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 17:36:02,489 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 17:36:07,128 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 17:36:18,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2237210.0, ans=0.125 2024-08-13 17:36:22,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6350, loss[loss=0.1155, beats_loss=0.01057, ecapa_loss=0.0001482, whisper_loss=0.1034, over 22524.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0109, ecapa_loss=0.000163, whisper_loss=0.09063, over 3808714.92 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:36:36,098 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 17:36:37,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2237410.0, ans=0.0 2024-08-13 17:36:40,835 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 17:36:42,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.465e+01 2.710e+01 3.055e+01 1.101e+02, threshold=5.419e+01, percent-clipped=2.0 2024-08-13 17:36:42,617 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 17:37:45,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6400, loss[loss=0.09709, beats_loss=0.01322, ecapa_loss=0.0001393, whisper_loss=0.08248, over 22501.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001637, whisper_loss=0.09117, over 3838352.68 frames. ], batch size: 92, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:37:48,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-08-13 17:37:51,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.25 vs. limit=22.5 2024-08-13 17:38:04,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2237910.0, ans=0.04949747468305833 2024-08-13 17:38:14,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=12.0 2024-08-13 17:38:15,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2237910.0, ans=0.1 2024-08-13 17:38:16,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2238010.0, ans=0.2 2024-08-13 17:38:26,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.59 vs. limit=10.0 2024-08-13 17:39:03,213 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6450, loss[loss=0.1153, beats_loss=0.009779, ecapa_loss=0.0001558, whisper_loss=0.104, over 22536.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001632, whisper_loss=0.09119, over 3865486.56 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:39:05,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2024-08-13 17:39:16,542 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 17:39:16,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2238310.0, ans=0.0 2024-08-13 17:39:22,728 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.439e+01 2.707e+01 3.110e+01 4.905e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-13 17:39:28,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-13 17:39:48,045 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 17:40:05,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2024-08-13 17:40:06,878 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:40:22,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6500, loss[loss=0.1222, beats_loss=0.01074, ecapa_loss=0.0001619, whisper_loss=0.1098, over 21650.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01079, ecapa_loss=0.000164, whisper_loss=0.09218, over 3876515.12 frames. ], batch size: 85, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:40:22,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2238810.0, ans=0.125 2024-08-13 17:40:31,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-13 17:40:36,790 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 17:40:38,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2238910.0, ans=0.125 2024-08-13 17:40:42,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2238910.0, ans=0.2 2024-08-13 17:40:48,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-13 17:40:58,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2239010.0, ans=0.05 2024-08-13 17:41:14,161 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 17:41:27,292 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 17:41:39,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6550, loss[loss=0.1104, beats_loss=0.01074, ecapa_loss=9.725e-05, whisper_loss=0.09865, over 15279.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0108, ecapa_loss=0.0001641, whisper_loss=0.09237, over 3889727.75 frames. ], batch size: 54, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:41:46,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2239310.0, ans=0.1 2024-08-13 17:41:49,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-08-13 17:41:52,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2239310.0, ans=0.125 2024-08-13 17:41:57,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.430e+01 2.695e+01 2.938e+01 3.674e+01, threshold=5.390e+01, percent-clipped=0.0 2024-08-13 17:42:09,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2024-08-13 17:42:15,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2239510.0, ans=0.1 2024-08-13 17:42:22,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2239510.0, ans=0.2 2024-08-13 17:42:28,017 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 17:42:35,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2239610.0, ans=0.125 2024-08-13 17:42:51,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2239710.0, ans=0.07 2024-08-13 17:42:58,575 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6600, loss[loss=0.1206, beats_loss=0.01023, ecapa_loss=0.0001532, whisper_loss=0.1089, over 18430.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01078, ecapa_loss=0.0001643, whisper_loss=0.09273, over 3950945.17 frames. ], batch size: 70, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:43:04,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2239810.0, ans=0.0 2024-08-13 17:43:15,775 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 19 from LS+wenet, 23 from Vox, 52 fro AS 2024-08-13 17:43:24,434 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 17:43:41,838 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 17:43:56,769 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 17:43:58,987 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 17:44:06,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2240110.0, ans=0.125 2024-08-13 17:44:26,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6650, loss[loss=0.09317, beats_loss=0.01252, ecapa_loss=0.0001363, whisper_loss=0.07929, over 13796.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01078, ecapa_loss=0.0001637, whisper_loss=0.09219, over 3916822.67 frames. ], batch size: 56, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:44:32,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2024-08-13 17:44:46,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.433e+01 2.609e+01 2.879e+01 3.999e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-13 17:44:51,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2240410.0, ans=0.1 2024-08-13 17:44:55,231 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 17:45:14,992 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 17:45:19,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2240610.0, ans=0.1 2024-08-13 17:45:28,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2240610.0, ans=0.0 2024-08-13 17:45:39,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2240710.0, ans=0.125 2024-08-13 17:45:50,200 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6700, loss[loss=0.1009, beats_loss=0.00961, ecapa_loss=0.0002037, whisper_loss=0.08921, over 20403.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001635, whisper_loss=0.09233, over 3902868.31 frames. ], batch size: 85, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:46:27,316 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 17:47:10,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2241210.0, ans=0.125 2024-08-13 17:47:13,344 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 17:47:15,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6750, loss[loss=0.09689, beats_loss=0.01212, ecapa_loss=0.0001807, whisper_loss=0.08296, over 21186.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001633, whisper_loss=0.09255, over 3919649.84 frames. ], batch size: 89, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:47:21,196 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 17:47:27,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2241310.0, ans=0.125 2024-08-13 17:47:28,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2241310.0, ans=0.09899494936611666 2024-08-13 17:47:37,466 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.480e+01 2.825e+01 3.141e+01 1.321e+02, threshold=5.651e+01, percent-clipped=2.0 2024-08-13 17:48:00,648 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 17:48:05,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2024-08-13 17:48:07,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2241610.0, ans=0.0 2024-08-13 17:48:15,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2241610.0, ans=0.125 2024-08-13 17:48:31,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2241710.0, ans=0.0 2024-08-13 17:48:34,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2241710.0, ans=0.125 2024-08-13 17:48:38,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6800, loss[loss=0.1113, beats_loss=0.009848, ecapa_loss=0.0001758, whisper_loss=0.09968, over 22103.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001644, whisper_loss=0.09189, over 3924596.90 frames. ], batch size: 88, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:48:50,565 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 17:49:01,687 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 17:49:02,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2241910.0, ans=0.125 2024-08-13 17:49:03,168 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 17:49:12,010 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 17:49:23,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-08-13 17:49:25,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=15.0 2024-08-13 17:49:37,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2242110.0, ans=0.5 2024-08-13 17:49:45,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2242210.0, ans=0.0 2024-08-13 17:49:49,625 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 17:50:00,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6850, loss[loss=0.09501, beats_loss=0.01057, ecapa_loss=0.000161, whisper_loss=0.08283, over 18338.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001648, whisper_loss=0.09143, over 3879007.29 frames. ], batch size: 73, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:50:04,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2242310.0, ans=0.125 2024-08-13 17:50:06,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2242310.0, ans=0.125 2024-08-13 17:50:11,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2242310.0, ans=0.125 2024-08-13 17:50:12,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2242310.0, ans=0.0 2024-08-13 17:50:20,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.361e+01 2.636e+01 2.867e+01 1.284e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 17:50:29,424 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 17:50:35,523 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 17:50:46,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-13 17:50:57,342 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 17:51:16,143 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 17:51:19,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2242810.0, ans=0.125 2024-08-13 17:51:20,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6900, loss[loss=0.1053, beats_loss=0.01143, ecapa_loss=0.0001682, whisper_loss=0.09222, over 18182.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01084, ecapa_loss=0.0001636, whisper_loss=0.09036, over 3849369.02 frames. ], batch size: 76, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:51:38,829 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 17:51:51,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2243010.0, ans=0.2 2024-08-13 17:51:55,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2243010.0, ans=0.1 2024-08-13 17:52:01,128 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 17:52:17,045 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 17:52:42,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 6950, loss[loss=0.0895, beats_loss=0.01251, ecapa_loss=0.0001652, whisper_loss=0.07534, over 15393.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01096, ecapa_loss=0.0001621, whisper_loss=0.08986, over 3855795.22 frames. ], batch size: 62, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:52:42,245 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 17:53:00,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2024-08-13 17:53:02,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.338e+01 2.546e+01 2.937e+01 5.530e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-13 17:53:30,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-13 17:53:49,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2024-08-13 17:53:56,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-13 17:54:03,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7000, loss[loss=0.1073, beats_loss=0.01006, ecapa_loss=0.0001672, whisper_loss=0.09561, over 21414.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001621, whisper_loss=0.09046, over 3873009.53 frames. ], batch size: 88, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:54:04,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2243810.0, ans=0.05 2024-08-13 17:54:05,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2243810.0, ans=0.0 2024-08-13 17:54:35,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2244010.0, ans=0.05 2024-08-13 17:54:40,347 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-13 17:54:47,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2244010.0, ans=0.125 2024-08-13 17:54:53,535 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 17:55:00,819 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-13 17:55:02,282 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 17:55:02,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2244110.0, ans=0.125 2024-08-13 17:55:02,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-08-13 17:55:26,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7050, loss[loss=0.09458, beats_loss=0.01324, ecapa_loss=0.0001513, whisper_loss=0.07983, over 18901.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01093, ecapa_loss=0.0001635, whisper_loss=0.09017, over 3842594.18 frames. ], batch size: 76, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:55:30,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2244310.0, ans=0.0 2024-08-13 17:55:33,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2244310.0, ans=0.125 2024-08-13 17:55:48,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.461e+01 2.675e+01 2.991e+01 1.291e+02, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 17:55:57,418 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-13 17:56:06,833 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 17:56:20,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2244610.0, ans=0.125 2024-08-13 17:56:21,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2244610.0, ans=0.125 2024-08-13 17:56:25,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2244610.0, ans=0.2 2024-08-13 17:56:31,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2244710.0, ans=0.2 2024-08-13 17:56:36,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2244710.0, ans=0.95 2024-08-13 17:56:38,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2244710.0, ans=0.2 2024-08-13 17:56:41,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2244710.0, ans=0.0 2024-08-13 17:56:47,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7100, loss[loss=0.1072, beats_loss=0.01125, ecapa_loss=0.0001376, whisper_loss=0.09458, over 23575.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0109, ecapa_loss=0.000163, whisper_loss=0.09068, over 3843580.13 frames. ], batch size: 94, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:56:53,816 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 17:56:56,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2244810.0, ans=0.125 2024-08-13 17:57:02,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2244910.0, ans=0.125 2024-08-13 17:57:12,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2244910.0, ans=0.0 2024-08-13 17:57:24,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-08-13 17:57:35,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.64 vs. limit=10.0 2024-08-13 17:58:08,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7150, loss[loss=0.1055, beats_loss=0.01149, ecapa_loss=0.0001684, whisper_loss=0.09233, over 14797.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01093, ecapa_loss=0.0001628, whisper_loss=0.09009, over 3848697.18 frames. ], batch size: 61, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:58:31,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.399e+01 2.676e+01 3.068e+01 5.307e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 17:58:40,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2245410.0, ans=0.1 2024-08-13 17:58:42,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2245510.0, ans=0.0 2024-08-13 17:58:48,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2024-08-13 17:59:25,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2245710.0, ans=0.2 2024-08-13 17:59:31,481 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 17:59:33,164 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7200, loss[loss=0.09581, beats_loss=0.01151, ecapa_loss=0.0001731, whisper_loss=0.08257, over 19107.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01093, ecapa_loss=0.0001627, whisper_loss=0.09022, over 3859024.88 frames. ], batch size: 77, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:59:38,490 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 17:59:48,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2245910.0, ans=0.1 2024-08-13 17:59:48,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2245910.0, ans=0.125 2024-08-13 18:00:10,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2246010.0, ans=0.125 2024-08-13 18:00:13,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2246010.0, ans=0.1 2024-08-13 18:00:13,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2246010.0, ans=0.0 2024-08-13 18:00:18,109 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 18:00:23,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2246110.0, ans=0.2 2024-08-13 18:00:41,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2246210.0, ans=0.125 2024-08-13 18:00:51,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2246210.0, ans=0.07 2024-08-13 18:00:51,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.90 vs. limit=10.0 2024-08-13 18:00:53,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7250, loss[loss=0.1145, beats_loss=0.01182, ecapa_loss=0.0001181, whisper_loss=0.1015, over 24570.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001627, whisper_loss=0.09115, over 3870580.71 frames. ], batch size: 92, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:01:01,022 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-13 18:01:02,642 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 18:01:15,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.504e+01 2.815e+01 3.088e+01 1.145e+02, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 18:01:16,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2246410.0, ans=0.2 2024-08-13 18:02:04,699 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 18:02:15,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7300, loss[loss=0.1036, beats_loss=0.01277, ecapa_loss=0.0001546, whisper_loss=0.08931, over 21758.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001619, whisper_loss=0.09122, over 3891016.84 frames. ], batch size: 87, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:02:20,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-13 18:02:27,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2246810.0, ans=0.1 2024-08-13 18:02:36,035 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 18:02:53,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2247010.0, ans=0.1 2024-08-13 18:03:11,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2247110.0, ans=0.125 2024-08-13 18:03:15,528 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-13 18:03:15,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2247110.0, ans=0.0 2024-08-13 18:03:17,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2247110.0, ans=0.125 2024-08-13 18:03:17,367 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.226e-01 2024-08-13 18:03:24,152 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 18:03:27,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2247210.0, ans=0.125 2024-08-13 18:03:27,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2247210.0, ans=0.025 2024-08-13 18:03:31,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-08-13 18:03:36,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7350, loss[loss=0.1135, beats_loss=0.01191, ecapa_loss=0.0001511, whisper_loss=0.1001, over 22793.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001616, whisper_loss=0.09126, over 3889804.51 frames. ], batch size: 89, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:03:38,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-13 18:03:58,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.425e+01 2.697e+01 3.109e+01 4.252e+01, threshold=5.395e+01, percent-clipped=0.0 2024-08-13 18:04:07,742 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 18:04:12,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2247510.0, ans=0.0 2024-08-13 18:04:20,501 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 18:04:23,264 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 18:04:43,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2247710.0, ans=0.2 2024-08-13 18:04:44,214 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 18:04:48,051 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-13 18:04:55,119 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 18:04:59,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7400, loss[loss=0.09065, beats_loss=0.01299, ecapa_loss=0.0001619, whisper_loss=0.07604, over 16698.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01087, ecapa_loss=0.0001619, whisper_loss=0.09106, over 3882523.25 frames. ], batch size: 72, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:04:59,280 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 18:05:13,589 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 18:05:19,795 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 42 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 18:05:25,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2247910.0, ans=0.1 2024-08-13 18:05:30,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2248010.0, ans=0.1 2024-08-13 18:05:35,179 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 17 from Vox, 53 fro AS 2024-08-13 18:05:38,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2248010.0, ans=0.1 2024-08-13 18:05:51,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-13 18:06:15,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2248210.0, ans=0.0 2024-08-13 18:06:17,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7450, loss[loss=0.1046, beats_loss=0.01142, ecapa_loss=0.0001745, whisper_loss=0.09142, over 20869.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001621, whisper_loss=0.091, over 3903690.95 frames. ], batch size: 84, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:06:37,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.475e+01 2.713e+01 3.023e+01 4.384e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-13 18:06:44,374 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 18:06:46,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2248410.0, ans=0.125 2024-08-13 18:06:55,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2248510.0, ans=0.125 2024-08-13 18:06:55,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2024-08-13 18:06:58,362 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 18:07:25,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2248710.0, ans=0.125 2024-08-13 18:07:37,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7500, loss[loss=0.06244, beats_loss=0.01297, ecapa_loss=0.0001955, whisper_loss=0.04751, over 18686.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01091, ecapa_loss=0.0001628, whisper_loss=0.09035, over 3895503.47 frames. ], batch size: 84, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:07:53,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2248910.0, ans=0.1 2024-08-13 18:08:21,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2249010.0, ans=0.1 2024-08-13 18:08:21,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-13 18:08:31,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2249110.0, ans=0.0 2024-08-13 18:08:37,106 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 18:08:58,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7550, loss[loss=0.1079, beats_loss=0.0113, ecapa_loss=0.0001545, whisper_loss=0.09508, over 23071.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01083, ecapa_loss=0.0001618, whisper_loss=0.09092, over 3871006.27 frames. ], batch size: 93, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:09:19,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.370e+01 2.691e+01 3.011e+01 5.049e+01, threshold=5.381e+01, percent-clipped=0.0 2024-08-13 18:09:33,061 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 18:09:46,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-08-13 18:09:49,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-08-13 18:10:06,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2249710.0, ans=0.125 2024-08-13 18:10:17,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7600, loss[loss=0.1114, beats_loss=0.0112, ecapa_loss=0.000147, whisper_loss=0.09873, over 20783.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.000162, whisper_loss=0.0905, over 3846989.81 frames. ], batch size: 81, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:10:20,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-13 18:10:23,208 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-13 18:10:38,100 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 18:10:50,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2250010.0, ans=0.0 2024-08-13 18:11:17,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2250110.0, ans=0.125 2024-08-13 18:11:30,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2250210.0, ans=0.125 2024-08-13 18:11:37,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2250310.0, ans=0.1 2024-08-13 18:11:38,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7650, loss[loss=0.1184, beats_loss=0.01012, ecapa_loss=0.0001461, whisper_loss=0.1068, over 23197.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001617, whisper_loss=0.09058, over 3841882.86 frames. ], batch size: 90, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:11:46,192 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 18:11:57,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.418e+01 2.664e+01 3.048e+01 4.401e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 18:12:26,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2250610.0, ans=0.125 2024-08-13 18:12:34,807 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 18:12:40,680 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 18:12:52,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7700, loss[loss=0.1058, beats_loss=0.01149, ecapa_loss=0.0001483, whisper_loss=0.09287, over 19377.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001619, whisper_loss=0.09019, over 3838287.32 frames. ], batch size: 75, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:12:57,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2250810.0, ans=0.125 2024-08-13 18:13:00,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2250810.0, ans=0.1 2024-08-13 18:13:37,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2251110.0, ans=0.1 2024-08-13 18:13:40,156 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 18:13:52,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2251210.0, ans=0.0 2024-08-13 18:14:05,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7750, loss[loss=0.1007, beats_loss=0.01111, ecapa_loss=0.0001938, whisper_loss=0.08768, over 21702.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001628, whisper_loss=0.09013, over 3838016.49 frames. ], batch size: 93, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:14:24,177 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.510e+01 2.727e+01 3.092e+01 1.354e+02, threshold=5.455e+01, percent-clipped=2.0 2024-08-13 18:14:35,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2251510.0, ans=0.2 2024-08-13 18:14:48,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2251610.0, ans=0.125 2024-08-13 18:14:53,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2251610.0, ans=0.025 2024-08-13 18:14:54,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-13 18:14:58,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2251610.0, ans=0.0 2024-08-13 18:15:00,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2251610.0, ans=0.0 2024-08-13 18:15:02,451 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 18:15:03,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=12.0 2024-08-13 18:15:17,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7800, loss[loss=0.1213, beats_loss=0.008052, ecapa_loss=0.0001785, whisper_loss=0.1115, over 16419.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001625, whisper_loss=0.0907, over 3868250.87 frames. ], batch size: 64, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:15:21,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-13 18:15:36,302 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 18:15:45,690 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 18:15:47,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2252010.0, ans=0.0 2024-08-13 18:15:50,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2252010.0, ans=0.95 2024-08-13 18:15:56,708 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 18:16:20,840 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 18:16:23,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2252210.0, ans=0.0 2024-08-13 18:16:30,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7850, loss[loss=0.08648, beats_loss=0.01076, ecapa_loss=0.0001704, whisper_loss=0.07402, over 16503.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01084, ecapa_loss=0.000162, whisper_loss=0.09021, over 3883538.43 frames. ], batch size: 68, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:16:38,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-13 18:16:45,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2252410.0, ans=0.0 2024-08-13 18:16:46,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2252410.0, ans=0.125 2024-08-13 18:16:47,518 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 18:16:48,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.348e+01 2.635e+01 3.053e+01 4.732e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 18:16:53,276 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 18:17:03,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2252510.0, ans=0.125 2024-08-13 18:17:13,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2252610.0, ans=0.125 2024-08-13 18:17:29,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2252710.0, ans=10.0 2024-08-13 18:17:44,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7900, loss[loss=0.08756, beats_loss=0.01312, ecapa_loss=0.0001527, whisper_loss=0.07291, over 20717.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01087, ecapa_loss=0.0001623, whisper_loss=0.09024, over 3871335.05 frames. ], batch size: 90, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:17:54,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=12.0 2024-08-13 18:18:06,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2252910.0, ans=0.125 2024-08-13 18:18:26,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2253110.0, ans=0.125 2024-08-13 18:18:48,898 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 18:18:49,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-08-13 18:18:57,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2253310.0, ans=15.0 2024-08-13 18:18:57,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 7950, loss[loss=0.1357, beats_loss=0.009171, ecapa_loss=0.0001774, whisper_loss=0.1247, over 14174.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01093, ecapa_loss=0.0001628, whisper_loss=0.09029, over 3881832.46 frames. ], batch size: 54, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:19:00,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2253310.0, ans=0.0 2024-08-13 18:19:05,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2253310.0, ans=0.125 2024-08-13 18:19:15,899 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.363e+01 2.645e+01 3.044e+01 5.205e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-13 18:19:20,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2253410.0, ans=0.125 2024-08-13 18:19:20,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2253410.0, ans=0.125 2024-08-13 18:19:32,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2253510.0, ans=0.0 2024-08-13 18:19:39,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2253510.0, ans=0.2 2024-08-13 18:19:49,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2253610.0, ans=0.0 2024-08-13 18:19:59,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2253710.0, ans=0.2 2024-08-13 18:20:01,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2253710.0, ans=0.1 2024-08-13 18:20:10,398 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 18:20:10,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2253710.0, ans=0.0 2024-08-13 18:20:10,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2253710.0, ans=0.0 2024-08-13 18:20:13,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8000, loss[loss=0.1241, beats_loss=0.009531, ecapa_loss=0.0001787, whisper_loss=0.1128, over 21921.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001629, whisper_loss=0.09068, over 3883700.18 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:20:43,925 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 18:20:44,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2254010.0, ans=0.1 2024-08-13 18:20:50,673 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 18:20:52,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2254010.0, ans=0.0 2024-08-13 18:21:24,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2024-08-13 18:21:26,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2254310.0, ans=0.125 2024-08-13 18:21:26,834 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8050, loss[loss=0.118, beats_loss=0.009797, ecapa_loss=0.0002297, whisper_loss=0.1059, over 21252.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001614, whisper_loss=0.09066, over 3848648.94 frames. ], batch size: 92, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:21:33,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2254310.0, ans=0.125 2024-08-13 18:21:46,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.332e+01 2.558e+01 3.003e+01 5.582e+01, threshold=5.115e+01, percent-clipped=1.0 2024-08-13 18:21:48,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2254410.0, ans=0.09899494936611666 2024-08-13 18:21:52,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2254410.0, ans=0.04949747468305833 2024-08-13 18:22:03,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2024-08-13 18:22:12,898 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 18:22:15,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=8.0 2024-08-13 18:22:16,902 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 18:22:38,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8100, loss[loss=0.1308, beats_loss=0.008862, ecapa_loss=0.0001443, whisper_loss=0.1205, over 24339.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01084, ecapa_loss=0.0001614, whisper_loss=0.09144, over 3867675.51 frames. ], batch size: 92, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:22:45,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2254810.0, ans=0.2 2024-08-13 18:23:03,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2254910.0, ans=0.0 2024-08-13 18:23:14,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=12.0 2024-08-13 18:23:15,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2255010.0, ans=0.0 2024-08-13 18:23:31,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2255110.0, ans=0.125 2024-08-13 18:23:37,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-08-13 18:23:42,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-13 18:23:44,271 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 18:23:49,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8150, loss[loss=0.08644, beats_loss=0.008541, ecapa_loss=0.0001758, whisper_loss=0.07614, over 14173.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0001618, whisper_loss=0.09105, over 3857401.04 frames. ], batch size: 53, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:23:54,597 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 18:23:58,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-13 18:23:58,914 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 18:24:09,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.440e+01 2.841e+01 3.164e+01 5.500e+01, threshold=5.681e+01, percent-clipped=1.0 2024-08-13 18:24:28,564 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-13 18:24:37,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2255610.0, ans=0.2 2024-08-13 18:25:02,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8200, loss[loss=0.09668, beats_loss=0.01303, ecapa_loss=0.0001394, whisper_loss=0.08225, over 21482.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01091, ecapa_loss=0.0001624, whisper_loss=0.08994, over 3864760.70 frames. ], batch size: 87, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:25:21,280 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 18:25:36,313 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 18:25:46,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2256110.0, ans=0.1 2024-08-13 18:25:53,953 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 18:25:55,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2256110.0, ans=0.125 2024-08-13 18:26:13,092 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 18:26:14,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8250, loss[loss=0.08256, beats_loss=0.01238, ecapa_loss=0.0001552, whisper_loss=0.06863, over 18984.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001618, whisper_loss=0.09026, over 3881213.22 frames. ], batch size: 79, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:26:32,170 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.303e+01 2.576e+01 2.826e+01 3.811e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 18:26:34,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2256410.0, ans=0.125 2024-08-13 18:26:38,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2256410.0, ans=0.125 2024-08-13 18:26:48,475 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 18:27:04,566 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:27:08,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2256610.0, ans=0.1 2024-08-13 18:27:16,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2256710.0, ans=0.125 2024-08-13 18:27:24,821 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04663357511162758, model_norm_threshold=51.52228546142578 2024-08-13 18:27:25,331 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.95, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.156e+06, grad_sumsq=1.333e+05, orig_rms_sq=8.675e+00 2024-08-13 18:27:25,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8300, loss[loss=0.09483, beats_loss=0.01102, ecapa_loss=0.0001564, whisper_loss=0.08224, over 16548.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0108, ecapa_loss=0.000161, whisper_loss=0.09046, over 3885990.60 frames. ], batch size: 68, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:27:30,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2256810.0, ans=0.0 2024-08-13 18:27:31,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2256810.0, ans=0.1 2024-08-13 18:27:35,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-13 18:27:44,569 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 18:27:48,235 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 18:27:48,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2256910.0, ans=0.0 2024-08-13 18:27:51,932 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-13 18:28:13,224 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 18:28:20,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-13 18:28:26,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-13 18:28:30,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8350, loss[loss=0.1275, beats_loss=0.009762, ecapa_loss=0.0001542, whisper_loss=0.1162, over 15235.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01078, ecapa_loss=0.0001617, whisper_loss=0.09114, over 3887780.75 frames. ], batch size: 58, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:28:34,686 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 18:28:37,388 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 18:28:38,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2257310.0, ans=0.125 2024-08-13 18:28:41,355 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-13 18:28:42,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2257410.0, ans=0.0 2024-08-13 18:28:47,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.502e+01 2.789e+01 3.217e+01 1.105e+03, threshold=5.579e+01, percent-clipped=3.0 2024-08-13 18:28:51,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2257410.0, ans=0.125 2024-08-13 18:28:57,148 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 18:29:00,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-13 18:29:07,765 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-13 18:29:32,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2257710.0, ans=0.2 2024-08-13 18:29:35,841 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8400, loss[loss=0.1167, beats_loss=0.008371, ecapa_loss=0.0001963, whisper_loss=0.1064, over 21381.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01066, ecapa_loss=0.000162, whisper_loss=0.09176, over 3893086.65 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:29:38,527 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 18:29:41,134 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 18:29:47,912 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 18:29:52,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2024-08-13 18:30:07,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2258010.0, ans=0.125 2024-08-13 18:30:12,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-13 18:30:20,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2258110.0, ans=0.125 2024-08-13 18:30:22,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2258110.0, ans=0.125 2024-08-13 18:30:30,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2258210.0, ans=0.0 2024-08-13 18:30:42,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8450, loss[loss=0.1017, beats_loss=0.009595, ecapa_loss=0.0001825, whisper_loss=0.09031, over 19007.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01063, ecapa_loss=0.0001619, whisper_loss=0.09215, over 3870496.64 frames. ], batch size: 78, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:30:44,426 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 18:30:46,659 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 18:30:50,882 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 18:30:56,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2258410.0, ans=0.2 2024-08-13 18:30:59,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.525e+01 2.749e+01 3.077e+01 1.697e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 18:31:07,998 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 18:31:08,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2258510.0, ans=0.125 2024-08-13 18:31:17,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2258510.0, ans=0.0 2024-08-13 18:31:18,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=2258510.0, ans=15.0 2024-08-13 18:31:19,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2258510.0, ans=0.125 2024-08-13 18:31:22,543 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-13 18:31:28,853 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 18:31:32,631 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 18:31:40,528 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 18:31:40,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2258710.0, ans=0.5 2024-08-13 18:31:48,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8500, loss[loss=0.1053, beats_loss=0.01066, ecapa_loss=0.0001557, whisper_loss=0.0931, over 20871.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.0001621, whisper_loss=0.09171, over 3869983.23 frames. ], batch size: 81, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:31:58,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2024-08-13 18:32:10,550 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 18:32:41,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2259210.0, ans=0.0 2024-08-13 18:32:54,198 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8550, loss[loss=0.09758, beats_loss=0.01152, ecapa_loss=0.0001429, whisper_loss=0.08463, over 14594.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001624, whisper_loss=0.09167, over 3868181.83 frames. ], batch size: 56, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:32:58,244 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 18:32:58,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2259310.0, ans=0.125 2024-08-13 18:32:59,702 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 18:33:04,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2259310.0, ans=0.2 2024-08-13 18:33:07,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2259410.0, ans=0.125 2024-08-13 18:33:10,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.379e+01 2.646e+01 2.938e+01 4.520e+01, threshold=5.292e+01, percent-clipped=0.0 2024-08-13 18:33:42,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2259610.0, ans=0.2 2024-08-13 18:33:58,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8600, loss[loss=0.08133, beats_loss=0.009321, ecapa_loss=0.0001956, whisper_loss=0.07006, over 16610.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01062, ecapa_loss=0.0001625, whisper_loss=0.09235, over 3882270.44 frames. ], batch size: 67, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:34:00,414 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-13 18:34:31,186 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 18:34:49,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2260110.0, ans=22.5 2024-08-13 18:34:53,920 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 18:35:06,607 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8650, loss[loss=0.08271, beats_loss=0.01031, ecapa_loss=0.0001884, whisper_loss=0.07052, over 15581.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01059, ecapa_loss=0.0001625, whisper_loss=0.09256, over 3885214.89 frames. ], batch size: 63, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:35:23,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2260410.0, ans=0.125 2024-08-13 18:35:24,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.420e+01 2.581e+01 2.912e+01 4.652e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-13 18:35:37,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2260510.0, ans=0.0 2024-08-13 18:35:43,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2260510.0, ans=0.2 2024-08-13 18:35:48,527 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 18:36:14,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8700, loss[loss=0.1198, beats_loss=0.009478, ecapa_loss=0.0001641, whisper_loss=0.1087, over 21520.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01058, ecapa_loss=0.0001639, whisper_loss=0.0926, over 3915810.29 frames. ], batch size: 84, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:36:16,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=22.5 2024-08-13 18:36:28,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-13 18:36:30,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2260910.0, ans=10.0 2024-08-13 18:36:33,740 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 18:36:57,088 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 18:36:57,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-13 18:37:06,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-13 18:37:08,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.81 vs. limit=22.5 2024-08-13 18:37:10,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2024-08-13 18:37:10,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-13 18:37:11,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2261110.0, ans=0.1 2024-08-13 18:37:21,251 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 18:37:24,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-13 18:37:28,826 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-13 18:37:30,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=15.0 2024-08-13 18:37:30,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2261210.0, ans=0.2 2024-08-13 18:37:32,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8750, loss[loss=0.1057, beats_loss=0.01064, ecapa_loss=0.0001542, whisper_loss=0.09351, over 17759.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01058, ecapa_loss=0.000164, whisper_loss=0.09205, over 3877058.88 frames. ], batch size: 71, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:37:45,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2261410.0, ans=0.1 2024-08-13 18:37:48,924 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 18:37:50,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.392e+01 2.713e+01 3.025e+01 4.261e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 18:37:54,256 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 18:38:09,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=22.5 2024-08-13 18:38:17,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2261610.0, ans=0.125 2024-08-13 18:38:48,109 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 18:38:54,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8800, loss[loss=0.09126, beats_loss=0.01244, ecapa_loss=0.000132, whisper_loss=0.0775, over 13921.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.0001629, whisper_loss=0.09211, over 3882702.80 frames. ], batch size: 53, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:39:34,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2262010.0, ans=0.0 2024-08-13 18:39:42,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2262010.0, ans=0.0 2024-08-13 18:39:57,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.61 vs. limit=10.0 2024-08-13 18:40:01,269 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 18:40:04,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2262110.0, ans=0.0 2024-08-13 18:40:21,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-13 18:40:28,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2262210.0, ans=0.125 2024-08-13 18:40:33,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8850, loss[loss=0.07623, beats_loss=0.01328, ecapa_loss=0.0001486, whisper_loss=0.06146, over 17409.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001619, whisper_loss=0.09203, over 3869143.08 frames. ], batch size: 72, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:40:54,540 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 18:40:57,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.612e+01 3.083e+01 5.604e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 18:41:09,127 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 18:41:09,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-13 18:41:18,883 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 18:41:20,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2262510.0, ans=0.1 2024-08-13 18:41:34,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-13 18:41:44,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2262610.0, ans=0.0 2024-08-13 18:41:53,282 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 18:41:58,334 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 18:42:03,698 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 18:42:05,996 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-13 18:42:09,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8900, loss[loss=0.1155, beats_loss=0.009639, ecapa_loss=0.0001598, whisper_loss=0.1043, over 23050.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001622, whisper_loss=0.09184, over 3876260.38 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:42:10,907 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 10 from Vox, 43 fro AS 2024-08-13 18:42:19,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-13 18:42:19,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-13 18:42:29,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-13 18:42:33,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-13 18:42:39,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2024-08-13 18:42:44,379 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 18:42:52,946 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.809e-01 2024-08-13 18:42:57,641 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 18:43:18,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2263110.0, ans=0.125 2024-08-13 18:43:33,205 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 8950, loss[loss=0.1033, beats_loss=0.01015, ecapa_loss=0.0001664, whisper_loss=0.09143, over 16818.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001612, whisper_loss=0.09171, over 3858472.29 frames. ], batch size: 65, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:43:34,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2263310.0, ans=0.09899494936611666 2024-08-13 18:43:41,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=12.0 2024-08-13 18:43:48,296 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 18:43:49,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.368e+01 2.604e+01 2.902e+01 4.386e+01, threshold=5.207e+01, percent-clipped=0.0 2024-08-13 18:44:04,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2263510.0, ans=0.125 2024-08-13 18:44:13,610 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 18:44:16,357 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:44:31,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-13 18:44:38,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9000, loss[loss=0.1329, beats_loss=0.008298, ecapa_loss=0.0001665, whisper_loss=0.123, over 23649.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001624, whisper_loss=0.09154, over 3859531.10 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:44:38,237 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 18:45:16,316 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4912, 3.2522, 2.5918, 2.8731], device='cuda:2') 2024-08-13 18:45:17,626 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005571, whisper_loss=0.2482, over 922467.00 frames. 2024-08-13 18:45:37,376 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on SV_voxceleb1: loss=0.004514, beats_loss=0, ecapa_loss=0.0004514, whisper_loss=0, over 939242.00 frames. 2024-08-13 18:47:39,615 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 18:47:39,619 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 18:47:52,611 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 18:47:52,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2263910.0, ans=0.2 2024-08-13 18:47:56,336 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 16 from Vox, 53 fro AS 2024-08-13 18:48:06,415 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 18:48:16,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2264010.0, ans=0.125 2024-08-13 18:48:28,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2264110.0, ans=0.125 2024-08-13 18:48:44,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-13 18:48:47,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9050, loss[loss=0.1076, beats_loss=0.0107, ecapa_loss=0.0001534, whisper_loss=0.09539, over 17105.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001616, whisper_loss=0.09151, over 3866103.97 frames. ], batch size: 68, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:48:49,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2264310.0, ans=0.125 2024-08-13 18:49:05,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.411e+01 2.657e+01 3.042e+01 5.076e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-13 18:49:07,969 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 18:49:13,527 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 20 from LS+wenet, 22 from Vox, 53 fro AS 2024-08-13 18:49:48,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=22.5 2024-08-13 18:49:57,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9100, loss[loss=0.1366, beats_loss=0.00833, ecapa_loss=0.0001426, whisper_loss=0.1268, over 15273.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001614, whisper_loss=0.09158, over 3892970.45 frames. ], batch size: 55, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:49:57,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2264810.0, ans=0.125 2024-08-13 18:50:05,304 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 18:50:09,165 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 18:50:13,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2264910.0, ans=0.125 2024-08-13 18:50:20,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-13 18:50:28,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-13 18:50:45,095 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 18:50:47,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2265110.0, ans=0.2 2024-08-13 18:50:58,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2265210.0, ans=0.125 2024-08-13 18:51:03,856 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 18:51:04,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2265210.0, ans=0.0 2024-08-13 18:51:07,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9150, loss[loss=0.1145, beats_loss=0.009317, ecapa_loss=0.0001332, whisper_loss=0.1039, over 20106.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01086, ecapa_loss=0.0001618, whisper_loss=0.09203, over 3877649.49 frames. ], batch size: 73, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:51:15,041 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 18:51:18,178 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 18:51:26,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.406e+01 2.793e+01 3.104e+01 4.161e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 18:51:41,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2265510.0, ans=0.0 2024-08-13 18:51:42,730 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 18:51:54,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2265610.0, ans=0.5 2024-08-13 18:52:15,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2024-08-13 18:52:17,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9200, loss[loss=0.09017, beats_loss=0.01313, ecapa_loss=0.0001209, whisper_loss=0.07583, over 19519.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01087, ecapa_loss=0.0001625, whisper_loss=0.09198, over 3872371.73 frames. ], batch size: 75, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:52:19,264 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-13 18:52:25,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-13 18:52:31,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-08-13 18:52:39,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2265910.0, ans=0.0 2024-08-13 18:52:55,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-08-13 18:53:05,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2266110.0, ans=0.2 2024-08-13 18:53:24,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9250, loss[loss=0.101, beats_loss=0.01193, ecapa_loss=0.0001301, whisper_loss=0.08775, over 17003.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001633, whisper_loss=0.09142, over 3893984.76 frames. ], batch size: 68, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:53:27,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2266310.0, ans=0.0 2024-08-13 18:53:28,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2266310.0, ans=0.05 2024-08-13 18:53:34,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2266310.0, ans=0.2 2024-08-13 18:53:41,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.413e+01 2.566e+01 3.082e+01 5.176e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-13 18:54:13,414 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:54:17,888 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:54:27,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.23 vs. limit=10.0 2024-08-13 18:54:32,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9300, loss[loss=0.09904, beats_loss=0.008832, ecapa_loss=0.00019, whisper_loss=0.08831, over 16829.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001632, whisper_loss=0.09145, over 3890495.81 frames. ], batch size: 67, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:54:39,007 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 37 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 18:54:52,942 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 13 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 18:54:54,372 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 18:54:57,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2266910.0, ans=0.0 2024-08-13 18:55:05,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2267010.0, ans=0.1 2024-08-13 18:55:13,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2267110.0, ans=0.125 2024-08-13 18:55:39,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2267210.0, ans=0.125 2024-08-13 18:55:40,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-08-13 18:55:41,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9350, loss[loss=0.1136, beats_loss=0.01154, ecapa_loss=0.0001643, whisper_loss=0.1004, over 14177.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001635, whisper_loss=0.0911, over 3822941.45 frames. ], batch size: 57, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:55:58,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.391e+01 2.659e+01 2.911e+01 1.966e+02, threshold=5.317e+01, percent-clipped=2.0 2024-08-13 18:56:03,841 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 18:56:09,154 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 18:56:16,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2267510.0, ans=0.125 2024-08-13 18:56:23,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2267610.0, ans=0.1 2024-08-13 18:56:23,881 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 18:56:33,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2267610.0, ans=0.1 2024-08-13 18:56:35,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2267710.0, ans=0.0 2024-08-13 18:56:37,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-13 18:56:48,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9400, loss[loss=0.09959, beats_loss=0.01285, ecapa_loss=0.0001374, whisper_loss=0.08536, over 13532.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001636, whisper_loss=0.09122, over 3842279.77 frames. ], batch size: 53, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:56:55,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2267810.0, ans=0.2 2024-08-13 18:57:08,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-13 18:57:25,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2268010.0, ans=0.0 2024-08-13 18:57:27,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2268110.0, ans=0.125 2024-08-13 18:57:30,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2268110.0, ans=0.1 2024-08-13 18:57:33,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-13 18:57:35,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2268110.0, ans=0.1 2024-08-13 18:57:54,968 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9450, loss[loss=0.09599, beats_loss=0.01368, ecapa_loss=0.0001619, whisper_loss=0.08069, over 20864.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.0001625, whisper_loss=0.09078, over 3876087.50 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:57:56,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2268310.0, ans=0.2 2024-08-13 18:58:08,291 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 18:58:12,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.364e+01 2.605e+01 2.951e+01 9.303e+01, threshold=5.211e+01, percent-clipped=2.0 2024-08-13 18:58:16,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-13 18:58:17,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2268410.0, ans=0.125 2024-08-13 18:58:22,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2268510.0, ans=0.0 2024-08-13 18:58:49,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2268710.0, ans=0.0 2024-08-13 18:58:51,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2268710.0, ans=0.125 2024-08-13 18:58:53,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2268710.0, ans=0.125 2024-08-13 18:58:59,086 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 18:59:00,209 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9500, loss[loss=0.1159, beats_loss=0.009383, ecapa_loss=0.0002191, whisper_loss=0.1043, over 14285.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001638, whisper_loss=0.08992, over 3856039.46 frames. ], batch size: 58, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:59:10,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2268810.0, ans=0.125 2024-08-13 18:59:15,068 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 18:59:26,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2269010.0, ans=0.125 2024-08-13 18:59:35,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2269010.0, ans=0.0 2024-08-13 18:59:53,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2269210.0, ans=0.0 2024-08-13 19:00:05,301 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 19:00:06,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9550, loss[loss=0.104, beats_loss=0.01155, ecapa_loss=0.0001405, whisper_loss=0.09107, over 17762.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001638, whisper_loss=0.09042, over 3861597.40 frames. ], batch size: 69, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:00:07,914 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 19:00:10,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2269310.0, ans=0.025 2024-08-13 19:00:23,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.300e+01 2.521e+01 2.795e+01 4.846e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-13 19:00:26,542 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 19:00:30,228 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 19:00:36,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2269510.0, ans=0.0 2024-08-13 19:00:52,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2269610.0, ans=0.125 2024-08-13 19:00:56,711 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:01:09,375 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 21 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-13 19:01:11,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9600, loss[loss=0.09421, beats_loss=0.01164, ecapa_loss=0.0001172, whisper_loss=0.08139, over 17972.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001647, whisper_loss=0.09027, over 3848792.36 frames. ], batch size: 68, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:01:15,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2269810.0, ans=10.0 2024-08-13 19:01:15,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2269810.0, ans=0.125 2024-08-13 19:01:19,593 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 19:01:21,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2269810.0, ans=0.09899494936611666 2024-08-13 19:01:22,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2024-08-13 19:01:23,367 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 19:01:26,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2269910.0, ans=0.1 2024-08-13 19:01:29,541 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 19:01:36,590 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 19:01:37,847 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 19:01:49,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2270110.0, ans=0.0 2024-08-13 19:02:00,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2270110.0, ans=0.125 2024-08-13 19:02:10,207 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 19:02:10,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-08-13 19:02:16,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9650, loss[loss=0.1061, beats_loss=0.01221, ecapa_loss=0.0001375, whisper_loss=0.09255, over 18920.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.000165, whisper_loss=0.09097, over 3861981.90 frames. ], batch size: 74, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:02:18,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2270310.0, ans=0.125 2024-08-13 19:02:19,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2270310.0, ans=0.125 2024-08-13 19:02:22,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2270310.0, ans=0.2 2024-08-13 19:02:28,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2270410.0, ans=0.2 2024-08-13 19:02:33,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.348e+01 2.592e+01 2.887e+01 4.146e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-13 19:02:37,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-13 19:02:38,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2270410.0, ans=0.1 2024-08-13 19:03:09,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2270710.0, ans=0.0 2024-08-13 19:03:18,517 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2024-08-13 19:03:21,779 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9700, loss[loss=0.108, beats_loss=0.008702, ecapa_loss=0.0001784, whisper_loss=0.09753, over 17919.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001656, whisper_loss=0.09068, over 3835600.25 frames. ], batch size: 67, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:03:24,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2270810.0, ans=0.2 2024-08-13 19:03:45,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2270910.0, ans=0.2 2024-08-13 19:04:01,078 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 19:04:02,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2271110.0, ans=0.125 2024-08-13 19:04:02,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-13 19:04:08,669 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 19:04:22,033 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 41 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 19:04:26,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9750, loss[loss=0.0771, beats_loss=0.01402, ecapa_loss=0.000174, whisper_loss=0.06134, over 20165.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001649, whisper_loss=0.09066, over 3842397.17 frames. ], batch size: 85, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:04:30,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2271310.0, ans=0.1 2024-08-13 19:04:35,177 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 19:04:41,851 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:04:43,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.462e+01 2.717e+01 3.058e+01 5.863e+01, threshold=5.433e+01, percent-clipped=1.0 2024-08-13 19:04:54,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2024-08-13 19:05:23,479 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 19:05:32,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9800, loss[loss=0.09499, beats_loss=0.009874, ecapa_loss=0.0001942, whisper_loss=0.08317, over 21047.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01088, ecapa_loss=0.0001637, whisper_loss=0.09054, over 3850562.48 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:06:07,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2272010.0, ans=0.2 2024-08-13 19:06:11,024 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 19:06:22,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2272110.0, ans=0.0 2024-08-13 19:06:27,175 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 19:06:27,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2024-08-13 19:06:31,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2272210.0, ans=0.0 2024-08-13 19:06:37,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9850, loss[loss=0.117, beats_loss=0.009502, ecapa_loss=0.0002311, whisper_loss=0.1052, over 21940.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001643, whisper_loss=0.09179, over 3868570.94 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:06:48,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2272310.0, ans=0.125 2024-08-13 19:06:53,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2272410.0, ans=0.07 2024-08-13 19:06:54,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.358e+01 2.690e+01 3.043e+01 6.098e+01, threshold=5.380e+01, percent-clipped=1.0 2024-08-13 19:06:55,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=12.0 2024-08-13 19:07:12,027 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 19:07:14,777 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 19:07:21,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2024-08-13 19:07:27,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2272610.0, ans=0.125 2024-08-13 19:07:31,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2272710.0, ans=0.125 2024-08-13 19:07:39,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2272710.0, ans=0.0 2024-08-13 19:07:41,540 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 19:07:42,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9900, loss[loss=0.09766, beats_loss=0.01272, ecapa_loss=0.0001448, whisper_loss=0.0835, over 19174.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01081, ecapa_loss=0.0001648, whisper_loss=0.09211, over 3874413.89 frames. ], batch size: 77, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:07:57,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-08-13 19:08:21,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2273110.0, ans=0.2 2024-08-13 19:08:26,454 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 19:08:33,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2273210.0, ans=0.015 2024-08-13 19:08:36,902 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 19:08:46,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2273310.0, ans=0.125 2024-08-13 19:08:47,034 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 9950, loss[loss=0.1128, beats_loss=0.01166, ecapa_loss=0.0001713, whisper_loss=0.09946, over 23688.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001645, whisper_loss=0.09168, over 3885474.92 frames. ], batch size: 94, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:09:00,025 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 19:09:03,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2273410.0, ans=0.0 2024-08-13 19:09:04,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.452e+01 2.692e+01 3.113e+01 1.874e+02, threshold=5.385e+01, percent-clipped=3.0 2024-08-13 19:09:20,646 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 19:09:30,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2273610.0, ans=0.0 2024-08-13 19:09:35,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2273610.0, ans=0.0 2024-08-13 19:09:49,647 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 19:09:50,964 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 17 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 19:09:52,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10000, loss[loss=0.07997, beats_loss=0.01442, ecapa_loss=0.0001209, whisper_loss=0.06435, over 21208.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.000164, whisper_loss=0.09112, over 3888729.39 frames. ], batch size: 86, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:10:00,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2273810.0, ans=0.0 2024-08-13 19:10:02,411 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 19:10:15,179 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 19:10:33,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2024-08-13 19:10:57,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10050, loss[loss=0.1104, beats_loss=0.01011, ecapa_loss=0.0001547, whisper_loss=0.09873, over 19997.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01089, ecapa_loss=0.0001637, whisper_loss=0.09089, over 3901837.35 frames. ], batch size: 77, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:11:03,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2274310.0, ans=0.125 2024-08-13 19:11:04,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2274310.0, ans=0.125 2024-08-13 19:11:12,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2274410.0, ans=0.0 2024-08-13 19:11:14,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.462e+01 2.706e+01 3.125e+01 1.991e+02, threshold=5.413e+01, percent-clipped=1.0 2024-08-13 19:11:14,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2274410.0, ans=0.0 2024-08-13 19:11:15,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2274410.0, ans=0.1 2024-08-13 19:11:18,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2274410.0, ans=0.0 2024-08-13 19:11:20,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2274410.0, ans=0.2 2024-08-13 19:11:23,147 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 19:11:32,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2274510.0, ans=0.125 2024-08-13 19:11:36,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2274610.0, ans=0.125 2024-08-13 19:11:37,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2274610.0, ans=0.5 2024-08-13 19:11:40,988 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 19:11:53,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2274710.0, ans=0.1 2024-08-13 19:12:01,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10100, loss[loss=0.1105, beats_loss=0.007595, ecapa_loss=0.000228, whisper_loss=0.1006, over 15414.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.000164, whisper_loss=0.09134, over 3929809.21 frames. ], batch size: 67, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:12:02,697 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 19:12:08,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-13 19:12:10,572 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 19:12:11,810 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 19:12:14,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.97 vs. limit=6.0 2024-08-13 19:12:18,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2274910.0, ans=0.04949747468305833 2024-08-13 19:12:22,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2274910.0, ans=0.125 2024-08-13 19:12:27,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2275010.0, ans=0.125 2024-08-13 19:12:40,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-08-13 19:12:42,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2275110.0, ans=0.125 2024-08-13 19:12:43,385 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 16 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 19:12:48,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2275110.0, ans=0.125 2024-08-13 19:12:57,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2275210.0, ans=0.05 2024-08-13 19:13:06,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-13 19:13:06,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10150, loss[loss=0.09939, beats_loss=0.009729, ecapa_loss=0.0001564, whisper_loss=0.08809, over 23701.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001644, whisper_loss=0.09123, over 3922263.32 frames. ], batch size: 92, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:13:17,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2275310.0, ans=0.0 2024-08-13 19:13:23,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-13 19:13:24,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.395e+01 2.644e+01 2.917e+01 4.595e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-13 19:13:24,801 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 19:13:35,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2024-08-13 19:13:48,810 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-13 19:13:51,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2275610.0, ans=0.025 2024-08-13 19:14:00,036 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 19:14:10,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2275710.0, ans=15.0 2024-08-13 19:14:15,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10200, loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001393, whisper_loss=0.09204, over 24045.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001637, whisper_loss=0.09112, over 3898226.78 frames. ], batch size: 93, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:14:20,012 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 19:14:21,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2275810.0, ans=0.125 2024-08-13 19:14:47,488 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 19:14:50,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2276010.0, ans=0.0 2024-08-13 19:14:51,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2276010.0, ans=0.125 2024-08-13 19:15:12,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2276110.0, ans=0.125 2024-08-13 19:15:16,557 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=12.0 2024-08-13 19:15:24,227 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-13 19:15:28,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2276210.0, ans=0.125 2024-08-13 19:15:31,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10250, loss[loss=0.1001, beats_loss=0.01067, ecapa_loss=0.000194, whisper_loss=0.08746, over 23142.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01077, ecapa_loss=0.0001641, whisper_loss=0.0912, over 3892610.26 frames. ], batch size: 92, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:15:41,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2276310.0, ans=0.1 2024-08-13 19:15:43,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2276310.0, ans=0.125 2024-08-13 19:15:44,757 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 19:15:48,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2276410.0, ans=0.125 2024-08-13 19:15:50,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2276410.0, ans=0.0 2024-08-13 19:15:53,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.373e+01 2.735e+01 3.155e+01 5.239e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-13 19:15:54,485 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 19:16:07,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2024-08-13 19:16:36,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2276710.0, ans=0.0 2024-08-13 19:16:41,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2276710.0, ans=0.125 2024-08-13 19:16:46,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10300, loss[loss=0.1202, beats_loss=0.0107, ecapa_loss=0.0001704, whisper_loss=0.1078, over 22186.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001636, whisper_loss=0.09113, over 3890583.16 frames. ], batch size: 88, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:16:55,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2276810.0, ans=0.0 2024-08-13 19:17:01,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2276910.0, ans=0.125 2024-08-13 19:17:01,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-08-13 19:17:23,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2277010.0, ans=0.125 2024-08-13 19:17:35,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2277110.0, ans=0.125 2024-08-13 19:17:51,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2277210.0, ans=0.0 2024-08-13 19:17:59,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2277210.0, ans=10.0 2024-08-13 19:18:03,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2277310.0, ans=0.0 2024-08-13 19:18:04,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10350, loss[loss=0.1085, beats_loss=0.01092, ecapa_loss=0.000125, whisper_loss=0.09632, over 17019.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001632, whisper_loss=0.09081, over 3914429.62 frames. ], batch size: 66, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:18:04,649 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 19:18:05,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-08-13 19:18:10,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-13 19:18:26,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.410e+01 2.741e+01 3.127e+01 1.313e+02, threshold=5.482e+01, percent-clipped=3.0 2024-08-13 19:18:36,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2277510.0, ans=0.125 2024-08-13 19:18:36,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2277510.0, ans=0.125 2024-08-13 19:18:41,933 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 19:19:05,496 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 19:19:07,048 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 19:19:10,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2277710.0, ans=0.0 2024-08-13 19:19:11,205 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 19:19:20,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10400, loss[loss=0.09576, beats_loss=0.01224, ecapa_loss=0.0001546, whisper_loss=0.08197, over 22051.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.0001618, whisper_loss=0.09047, over 3915509.08 frames. ], batch size: 90, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:19:24,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-13 19:19:26,954 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-13 19:19:33,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2277810.0, ans=0.125 2024-08-13 19:19:37,549 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 19:19:40,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2277910.0, ans=0.125 2024-08-13 19:19:47,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2277910.0, ans=0.0 2024-08-13 19:19:47,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2277910.0, ans=0.0 2024-08-13 19:20:03,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2278110.0, ans=0.125 2024-08-13 19:20:10,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2278110.0, ans=0.125 2024-08-13 19:20:13,136 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 19:20:27,965 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 19:20:33,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10450, loss[loss=0.1052, beats_loss=0.008728, ecapa_loss=0.0001777, whisper_loss=0.0947, over 18264.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01086, ecapa_loss=0.0001625, whisper_loss=0.09017, over 3891841.65 frames. ], batch size: 73, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:20:55,257 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.408e+01 2.686e+01 2.992e+01 7.083e+01, threshold=5.372e+01, percent-clipped=1.0 2024-08-13 19:21:10,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2278510.0, ans=0.2 2024-08-13 19:21:28,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2278610.0, ans=0.0 2024-08-13 19:21:49,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10500, loss[loss=0.09372, beats_loss=0.0112, ecapa_loss=0.0001724, whisper_loss=0.0808, over 14586.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01086, ecapa_loss=0.0001641, whisper_loss=0.09037, over 3903996.23 frames. ], batch size: 61, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:22:00,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2278810.0, ans=0.0 2024-08-13 19:22:04,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2024-08-13 19:22:05,293 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 25 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 19:22:14,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2278910.0, ans=0.2 2024-08-13 19:22:16,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2278910.0, ans=0.1 2024-08-13 19:22:18,921 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 19:22:29,628 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 19:22:36,568 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 19:22:50,708 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 19:22:58,194 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 19:23:02,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2279210.0, ans=0.04949747468305833 2024-08-13 19:23:05,942 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10550, loss[loss=0.0682, beats_loss=0.01045, ecapa_loss=0.0002294, whisper_loss=0.05546, over 13137.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01078, ecapa_loss=0.0001658, whisper_loss=0.09021, over 3851600.86 frames. ], batch size: 57, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:23:15,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2279310.0, ans=0.0 2024-08-13 19:23:18,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2279310.0, ans=0.0 2024-08-13 19:23:29,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.431e+01 2.823e+01 3.244e+01 7.825e+01, threshold=5.646e+01, percent-clipped=1.0 2024-08-13 19:23:31,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-08-13 19:23:33,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2279410.0, ans=0.125 2024-08-13 19:23:35,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2279410.0, ans=0.125 2024-08-13 19:23:43,507 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 19:23:55,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2279610.0, ans=0.1 2024-08-13 19:23:58,684 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 19:24:04,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2279610.0, ans=0.125 2024-08-13 19:24:16,766 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-13 19:24:17,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2279710.0, ans=0.0 2024-08-13 19:24:26,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10600, loss[loss=0.09253, beats_loss=0.008732, ecapa_loss=0.0001937, whisper_loss=0.08186, over 15305.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001649, whisper_loss=0.0901, over 3848511.42 frames. ], batch size: 61, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:24:29,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2279810.0, ans=0.2 2024-08-13 19:24:37,318 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:24:46,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2279910.0, ans=0.1 2024-08-13 19:24:47,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-13 19:24:51,672 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 19:24:56,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-13 19:25:08,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2280010.0, ans=0.125 2024-08-13 19:25:13,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2280010.0, ans=0.0 2024-08-13 19:25:22,213 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 19:25:28,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2280110.0, ans=0.1 2024-08-13 19:25:41,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-13 19:25:51,502 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10650, loss[loss=0.1079, beats_loss=0.01065, ecapa_loss=0.0001686, whisper_loss=0.09556, over 20894.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001634, whisper_loss=0.09046, over 3849050.30 frames. ], batch size: 84, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:25:58,393 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-13 19:25:59,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2280310.0, ans=0.0 2024-08-13 19:26:05,609 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:26:08,691 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 19:26:15,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.293e+01 2.581e+01 2.913e+01 4.333e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-13 19:26:15,580 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 19:26:19,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-08-13 19:26:21,863 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 19:26:22,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2280410.0, ans=0.0 2024-08-13 19:26:25,003 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 19:26:40,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2280610.0, ans=0.125 2024-08-13 19:27:02,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-13 19:27:13,044 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:27:13,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2280810.0, ans=0.125 2024-08-13 19:27:13,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10700, loss[loss=0.09222, beats_loss=0.01201, ecapa_loss=0.0001524, whisper_loss=0.07869, over 16438.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001622, whisper_loss=0.09053, over 3857718.70 frames. ], batch size: 67, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:27:14,047 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 19:27:22,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2280810.0, ans=0.125 2024-08-13 19:27:33,235 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 19:28:03,891 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 19:28:14,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2281110.0, ans=0.0 2024-08-13 19:28:25,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2281210.0, ans=0.1 2024-08-13 19:28:34,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10750, loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001621, whisper_loss=0.09034, over 17190.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.000163, whisper_loss=0.0919, over 3888330.33 frames. ], batch size: 69, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:28:42,122 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 19:28:57,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.479e+01 2.782e+01 3.163e+01 7.452e+01, threshold=5.564e+01, percent-clipped=1.0 2024-08-13 19:28:58,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2024-08-13 19:29:04,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2281410.0, ans=0.1 2024-08-13 19:29:06,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-08-13 19:29:07,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2281510.0, ans=0.0 2024-08-13 19:29:21,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2024-08-13 19:29:46,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2281710.0, ans=0.0 2024-08-13 19:29:55,869 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10800, loss[loss=0.1115, beats_loss=0.01032, ecapa_loss=0.0001493, whisper_loss=0.0997, over 15672.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01077, ecapa_loss=0.0001631, whisper_loss=0.09226, over 3883129.12 frames. ], batch size: 58, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:29:59,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=22.5 2024-08-13 19:30:04,864 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 19:30:09,922 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 19:30:13,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2281910.0, ans=0.1 2024-08-13 19:30:22,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2281910.0, ans=0.125 2024-08-13 19:30:25,864 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 19:30:56,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 19:30:56,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=12.0 2024-08-13 19:31:01,366 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 19:31:02,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-13 19:31:16,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10850, loss[loss=0.08617, beats_loss=0.01194, ecapa_loss=0.0001428, whisper_loss=0.0728, over 16830.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001634, whisper_loss=0.09177, over 3896116.82 frames. ], batch size: 67, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:31:29,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2282310.0, ans=0.125 2024-08-13 19:31:38,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.579e+01 2.775e+01 3.149e+01 7.029e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 19:31:41,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2282410.0, ans=0.125 2024-08-13 19:31:59,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2282510.0, ans=0.2 2024-08-13 19:32:02,324 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 19:32:06,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2282610.0, ans=0.125 2024-08-13 19:32:06,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-13 19:32:39,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2282810.0, ans=0.0 2024-08-13 19:32:40,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10900, loss[loss=0.1191, beats_loss=0.00918, ecapa_loss=0.0001539, whisper_loss=0.1084, over 24282.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001628, whisper_loss=0.09146, over 3921142.80 frames. ], batch size: 92, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:32:59,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2282910.0, ans=0.0 2024-08-13 19:33:00,405 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 19:33:09,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2282910.0, ans=0.0 2024-08-13 19:33:13,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2283010.0, ans=0.125 2024-08-13 19:33:19,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2283010.0, ans=0.0 2024-08-13 19:33:31,141 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 19:33:49,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-13 19:33:50,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-13 19:34:00,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 10950, loss[loss=0.1248, beats_loss=0.008514, ecapa_loss=0.0001359, whisper_loss=0.1149, over 15170.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001614, whisper_loss=0.09193, over 3907319.19 frames. ], batch size: 55, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:34:03,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2283310.0, ans=0.1 2024-08-13 19:34:04,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2283310.0, ans=0.125 2024-08-13 19:34:15,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2283410.0, ans=0.125 2024-08-13 19:34:20,382 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 19:34:23,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.358e+01 2.590e+01 2.815e+01 3.849e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-13 19:34:27,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2283410.0, ans=15.0 2024-08-13 19:35:09,806 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 19:35:22,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11000, loss[loss=0.08638, beats_loss=0.01001, ecapa_loss=0.0002067, whisper_loss=0.0743, over 19263.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001619, whisper_loss=0.09224, over 3911149.50 frames. ], batch size: 85, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:35:30,854 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 19:36:05,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2284010.0, ans=0.2 2024-08-13 19:36:16,824 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 31 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 19:36:45,146 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11050, loss[loss=0.1213, beats_loss=0.007884, ecapa_loss=0.0001816, whisper_loss=0.1116, over 17916.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0106, ecapa_loss=0.000163, whisper_loss=0.09223, over 3941369.81 frames. ], batch size: 71, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:36:51,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2284310.0, ans=0.0 2024-08-13 19:36:58,370 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 19:37:08,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.460e+01 2.684e+01 3.016e+01 4.539e+01, threshold=5.368e+01, percent-clipped=0.0 2024-08-13 19:37:27,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2284510.0, ans=0.125 2024-08-13 19:37:34,390 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 19:37:37,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2284610.0, ans=0.0 2024-08-13 19:37:50,796 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 19:37:53,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2284710.0, ans=0.1 2024-08-13 19:37:58,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2284710.0, ans=0.125 2024-08-13 19:38:05,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2284710.0, ans=0.0 2024-08-13 19:38:07,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11100, loss[loss=0.1048, beats_loss=0.009313, ecapa_loss=0.0001623, whisper_loss=0.09391, over 17674.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01058, ecapa_loss=0.0001639, whisper_loss=0.09249, over 3936199.43 frames. ], batch size: 69, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:38:15,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2284810.0, ans=0.125 2024-08-13 19:38:15,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2284810.0, ans=0.125 2024-08-13 19:38:17,858 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 19:38:19,938 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 19:38:25,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2284910.0, ans=0.125 2024-08-13 19:38:40,457 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 19:38:52,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2285010.0, ans=0.0 2024-08-13 19:38:54,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2285010.0, ans=0.125 2024-08-13 19:39:05,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2024-08-13 19:39:08,171 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 19:39:11,339 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 19:39:25,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2285210.0, ans=0.125 2024-08-13 19:39:29,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11150, loss[loss=0.1163, beats_loss=0.009563, ecapa_loss=0.0001824, whisper_loss=0.1049, over 23254.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001628, whisper_loss=0.09205, over 3932005.19 frames. ], batch size: 94, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:39:52,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.378e+01 2.624e+01 2.890e+01 4.520e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-13 19:39:56,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2285410.0, ans=0.125 2024-08-13 19:40:02,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2285510.0, ans=0.1 2024-08-13 19:40:10,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2285510.0, ans=0.125 2024-08-13 19:40:13,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2285510.0, ans=0.125 2024-08-13 19:40:29,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2285610.0, ans=0.0 2024-08-13 19:40:41,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2285710.0, ans=0.125 2024-08-13 19:40:41,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2285710.0, ans=10.0 2024-08-13 19:40:50,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2285710.0, ans=0.1 2024-08-13 19:40:52,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11200, loss[loss=0.1262, beats_loss=0.009768, ecapa_loss=0.0001401, whisper_loss=0.1151, over 23064.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001632, whisper_loss=0.09172, over 3919161.68 frames. ], batch size: 89, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:41:46,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2286110.0, ans=0.1 2024-08-13 19:42:10,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2286210.0, ans=0.0 2024-08-13 19:42:12,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.14 vs. limit=10.0 2024-08-13 19:42:13,126 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11250, loss[loss=0.1043, beats_loss=0.01125, ecapa_loss=0.0001485, whisper_loss=0.09153, over 22181.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001621, whisper_loss=0.09221, over 3942841.54 frames. ], batch size: 85, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:42:34,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.479e+01 2.663e+01 3.017e+01 9.282e+01, threshold=5.327e+01, percent-clipped=2.0 2024-08-13 19:42:40,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-08-13 19:42:44,651 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 19:42:45,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2286510.0, ans=0.0 2024-08-13 19:42:55,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2286510.0, ans=0.0 2024-08-13 19:43:18,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2286710.0, ans=0.0 2024-08-13 19:43:25,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2286710.0, ans=0.125 2024-08-13 19:43:31,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11300, loss[loss=0.1039, beats_loss=0.01109, ecapa_loss=0.0001322, whisper_loss=0.09147, over 18251.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0107, ecapa_loss=0.0001616, whisper_loss=0.09233, over 3901786.30 frames. ], batch size: 71, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:43:46,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.67 vs. limit=10.0 2024-08-13 19:43:58,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-13 19:44:23,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2287110.0, ans=0.125 2024-08-13 19:44:24,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2287110.0, ans=0.0 2024-08-13 19:44:50,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2287210.0, ans=0.2 2024-08-13 19:44:53,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11350, loss[loss=0.1163, beats_loss=0.01076, ecapa_loss=0.0001371, whisper_loss=0.1042, over 16319.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0106, ecapa_loss=0.0001625, whisper_loss=0.09252, over 3906435.73 frames. ], batch size: 63, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:44:58,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2287310.0, ans=0.125 2024-08-13 19:44:59,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2287310.0, ans=0.05 2024-08-13 19:45:04,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2287310.0, ans=0.2 2024-08-13 19:45:07,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2287410.0, ans=0.95 2024-08-13 19:45:10,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2287410.0, ans=0.2 2024-08-13 19:45:15,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.325e+01 2.679e+01 3.004e+01 6.399e+01, threshold=5.358e+01, percent-clipped=2.0 2024-08-13 19:45:16,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-08-13 19:45:49,613 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 19:46:00,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2287710.0, ans=0.125 2024-08-13 19:46:14,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11400, loss[loss=0.1049, beats_loss=0.01139, ecapa_loss=0.0001843, whisper_loss=0.09163, over 22651.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01063, ecapa_loss=0.0001619, whisper_loss=0.09257, over 3906614.95 frames. ], batch size: 94, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:46:17,666 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 19:46:22,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2287810.0, ans=22.5 2024-08-13 19:46:25,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2287810.0, ans=0.125 2024-08-13 19:46:46,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2288010.0, ans=0.0 2024-08-13 19:47:06,290 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 19:47:15,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2288110.0, ans=0.125 2024-08-13 19:47:32,792 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 34 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 19:47:33,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11450, loss[loss=0.1264, beats_loss=0.008663, ecapa_loss=0.000176, whisper_loss=0.1159, over 18493.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001624, whisper_loss=0.09215, over 3909770.95 frames. ], batch size: 75, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:47:34,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2288310.0, ans=0.125 2024-08-13 19:47:55,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.447e+01 2.680e+01 2.957e+01 5.322e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-13 19:48:37,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2024-08-13 19:48:51,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11500, loss[loss=0.1372, beats_loss=0.007798, ecapa_loss=0.0001501, whisper_loss=0.1279, over 18070.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001616, whisper_loss=0.09193, over 3912651.76 frames. ], batch size: 65, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:48:51,984 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 19:48:52,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2288810.0, ans=0.1 2024-08-13 19:48:56,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-13 19:49:00,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2288810.0, ans=0.05 2024-08-13 19:49:02,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2288810.0, ans=0.125 2024-08-13 19:49:07,159 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 19:49:14,749 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-13 19:49:27,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2289010.0, ans=0.0 2024-08-13 19:49:34,382 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 19:49:40,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2289110.0, ans=0.125 2024-08-13 19:49:54,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2024-08-13 19:49:54,944 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 19:49:55,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2289210.0, ans=0.1 2024-08-13 19:49:57,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2289210.0, ans=0.0 2024-08-13 19:50:13,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11550, loss[loss=0.07781, beats_loss=0.01473, ecapa_loss=0.0001433, whisper_loss=0.06165, over 16472.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01061, ecapa_loss=0.0001624, whisper_loss=0.09295, over 3921635.57 frames. ], batch size: 68, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:50:17,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-08-13 19:50:36,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.552e+01 2.829e+01 3.234e+01 6.675e+01, threshold=5.658e+01, percent-clipped=2.0 2024-08-13 19:50:41,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-13 19:50:44,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2289410.0, ans=0.0 2024-08-13 19:50:45,504 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 19:50:48,300 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 19:50:55,827 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 19:50:59,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-13 19:51:00,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2289510.0, ans=0.07 2024-08-13 19:51:05,500 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 19:51:10,099 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 19:51:11,416 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 19:51:17,891 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 19:51:32,064 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-13 19:51:35,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11600, loss[loss=0.1073, beats_loss=0.01076, ecapa_loss=0.0001232, whisper_loss=0.09535, over 16200.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.0001632, whisper_loss=0.09208, over 3919004.62 frames. ], batch size: 61, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:51:40,559 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 19:51:42,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2289810.0, ans=0.1 2024-08-13 19:52:13,404 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 19:52:16,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2290010.0, ans=0.2 2024-08-13 19:52:19,545 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 19:52:32,658 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 19:52:35,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2290110.0, ans=0.125 2024-08-13 19:52:42,318 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 19:52:52,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2290210.0, ans=0.2 2024-08-13 19:52:59,414 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11650, loss[loss=0.0867, beats_loss=0.0105, ecapa_loss=0.0001545, whisper_loss=0.07466, over 17768.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001633, whisper_loss=0.09162, over 3915635.98 frames. ], batch size: 71, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:52:59,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2290310.0, ans=0.0 2024-08-13 19:53:00,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2024-08-13 19:53:22,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.408e+01 2.632e+01 2.967e+01 4.953e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-13 19:53:30,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2290410.0, ans=0.1 2024-08-13 19:53:32,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2290510.0, ans=0.125 2024-08-13 19:53:52,595 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 19:53:56,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2290610.0, ans=0.95 2024-08-13 19:53:56,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2290610.0, ans=0.2 2024-08-13 19:53:57,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-13 19:53:58,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2024-08-13 19:54:03,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2290610.0, ans=0.1 2024-08-13 19:54:23,580 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11700, loss[loss=0.11, beats_loss=0.01171, ecapa_loss=0.0001655, whisper_loss=0.09661, over 23111.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.000162, whisper_loss=0.09164, over 3942615.81 frames. ], batch size: 93, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:54:52,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2290910.0, ans=0.0 2024-08-13 19:55:29,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2291210.0, ans=0.0 2024-08-13 19:55:46,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11750, loss[loss=0.1212, beats_loss=0.0116, ecapa_loss=0.0001315, whisper_loss=0.1083, over 22609.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001615, whisper_loss=0.09156, over 3949313.64 frames. ], batch size: 88, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:55:47,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2291310.0, ans=0.0 2024-08-13 19:55:58,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2024-08-13 19:56:01,808 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 19:56:10,246 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 19:56:11,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.398e+01 2.617e+01 2.949e+01 4.150e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 19:56:12,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2291410.0, ans=0.07 2024-08-13 19:56:16,118 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 19:56:49,467 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 19:57:09,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11800, loss[loss=0.09105, beats_loss=0.01006, ecapa_loss=0.0001541, whisper_loss=0.07944, over 21059.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001607, whisper_loss=0.09106, over 3924390.90 frames. ], batch size: 83, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:57:27,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2291910.0, ans=0.0 2024-08-13 19:57:29,212 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 19:57:54,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2292010.0, ans=15.0 2024-08-13 19:57:57,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2292010.0, ans=0.1 2024-08-13 19:57:58,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2292110.0, ans=0.125 2024-08-13 19:57:58,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2292110.0, ans=0.125 2024-08-13 19:58:14,552 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 19:58:32,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11850, loss[loss=0.08329, beats_loss=0.01222, ecapa_loss=0.0002038, whisper_loss=0.06904, over 21875.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001616, whisper_loss=0.09102, over 3924494.86 frames. ], batch size: 98, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:58:32,827 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 19:58:35,693 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 19:58:44,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2292310.0, ans=0.125 2024-08-13 19:58:45,343 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 27 from LS+wenet, 15 from Vox, 12 fro AS 2024-08-13 19:58:55,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.456e+01 2.721e+01 2.965e+01 7.443e+01, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 19:59:09,999 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 19:59:16,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2292510.0, ans=0.125 2024-08-13 19:59:32,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2292610.0, ans=0.125 2024-08-13 19:59:47,125 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 19:59:52,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11900, loss[loss=0.1169, beats_loss=0.007681, ecapa_loss=0.0001641, whisper_loss=0.1075, over 22406.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001629, whisper_loss=0.09193, over 3936639.33 frames. ], batch size: 88, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:00:01,638 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 20:00:06,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2292810.0, ans=0.0 2024-08-13 20:00:21,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2292910.0, ans=0.0 2024-08-13 20:00:24,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2293010.0, ans=0.125 2024-08-13 20:00:43,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2293110.0, ans=0.0 2024-08-13 20:00:47,460 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 20:01:02,293 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 20:01:05,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2293210.0, ans=0.0 2024-08-13 20:01:12,063 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 20:01:13,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 11950, loss[loss=0.07332, beats_loss=0.01217, ecapa_loss=0.0001692, whisper_loss=0.05946, over 16646.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01072, ecapa_loss=0.0001624, whisper_loss=0.09222, over 3930566.21 frames. ], batch size: 68, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:01:35,782 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.292e+01 2.621e+01 2.963e+01 5.710e+01, threshold=5.241e+01, percent-clipped=1.0 2024-08-13 20:01:51,994 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 20:01:55,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2293510.0, ans=0.125 2024-08-13 20:01:56,711 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 20:02:04,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2293610.0, ans=0.1 2024-08-13 20:02:22,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2293710.0, ans=0.5 2024-08-13 20:02:24,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=12.0 2024-08-13 20:02:25,084 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 20:02:32,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12000, loss[loss=0.1214, beats_loss=0.01058, ecapa_loss=0.0001556, whisper_loss=0.1093, over 20909.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01077, ecapa_loss=0.000162, whisper_loss=0.09237, over 3930028.84 frames. ], batch size: 81, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:02:32,115 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 20:03:12,713 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005542, whisper_loss=0.248, over 922467.00 frames. 2024-08-13 20:03:33,746 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on SV_voxceleb1: loss=0.004415, beats_loss=0, ecapa_loss=0.0004415, whisper_loss=0, over 939242.00 frames. 2024-08-13 20:04:01,174 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8545, 4.0791, 2.7680, 4.6733], device='cuda:2') 2024-08-13 20:04:09,160 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6206, 1.9739, 1.7889, 1.0020], device='cuda:2') 2024-08-13 20:05:21,808 INFO [train_multi_KD3.py:1149] (2/4) Epoch 16, validation on AT_audioset: loss=0.02371, beats_loss=0.02371, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 20:05:21,811 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 20:05:23,302 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 20:05:32,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2293810.0, ans=0.125 2024-08-13 20:05:32,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=12.0 2024-08-13 20:05:34,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2293810.0, ans=0.125 2024-08-13 20:05:53,259 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 20:06:04,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2294010.0, ans=0.125 2024-08-13 20:06:08,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-13 20:06:44,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12050, loss[loss=0.08694, beats_loss=0.01044, ecapa_loss=0.0001877, whisper_loss=0.07463, over 21345.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01072, ecapa_loss=0.0001627, whisper_loss=0.09237, over 3930154.67 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:06:44,480 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 20:06:44,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2294310.0, ans=0.125 2024-08-13 20:06:47,653 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 20:07:07,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.498e+01 2.752e+01 3.060e+01 1.760e+02, threshold=5.504e+01, percent-clipped=2.0 2024-08-13 20:07:32,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-13 20:07:33,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2294610.0, ans=0.125 2024-08-13 20:07:36,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2294610.0, ans=0.125 2024-08-13 20:07:44,188 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 16 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 20:07:45,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2294610.0, ans=0.125 2024-08-13 20:07:47,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-08-13 20:07:49,641 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 20:07:49,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2294710.0, ans=0.2 2024-08-13 20:08:04,041 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 20:08:08,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12100, loss[loss=0.1232, beats_loss=0.00869, ecapa_loss=0.0001948, whisper_loss=0.1126, over 21510.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01072, ecapa_loss=0.0001619, whisper_loss=0.09282, over 3910223.71 frames. ], batch size: 86, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:08:14,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2294810.0, ans=0.125 2024-08-13 20:08:22,288 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 20:08:27,644 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 20:08:31,037 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 25 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-13 20:08:45,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2295010.0, ans=0.0 2024-08-13 20:08:53,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2295010.0, ans=0.05 2024-08-13 20:09:00,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2295110.0, ans=0.0 2024-08-13 20:09:03,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-13 20:09:25,893 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 20:09:29,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12150, loss[loss=0.1161, beats_loss=0.01064, ecapa_loss=0.0001584, whisper_loss=0.1039, over 23549.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01071, ecapa_loss=0.000161, whisper_loss=0.09256, over 3900919.46 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:09:38,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2295310.0, ans=0.125 2024-08-13 20:09:52,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.320e+01 2.600e+01 2.810e+01 5.391e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-13 20:10:22,665 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 20:10:55,515 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12200, loss[loss=0.08668, beats_loss=0.01034, ecapa_loss=0.0001592, whisper_loss=0.07475, over 19272.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0107, ecapa_loss=0.0001615, whisper_loss=0.09235, over 3883360.15 frames. ], batch size: 76, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:11:01,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2295810.0, ans=0.125 2024-08-13 20:11:08,346 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.05 vs. limit=22.5 2024-08-13 20:11:48,323 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 20:11:59,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2296210.0, ans=0.125 2024-08-13 20:12:14,923 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 20:12:16,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12250, loss[loss=0.09658, beats_loss=0.01195, ecapa_loss=0.0001773, whisper_loss=0.08286, over 22424.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001609, whisper_loss=0.09218, over 3881196.46 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:12:29,026 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 20:12:32,291 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 20:12:38,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.422e+01 2.679e+01 2.912e+01 4.424e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-13 20:12:54,996 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-13 20:13:02,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2296510.0, ans=0.0 2024-08-13 20:13:09,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2296610.0, ans=0.125 2024-08-13 20:13:19,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2296710.0, ans=15.0 2024-08-13 20:13:30,031 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 14 from Vox, 56 fro AS 2024-08-13 20:13:31,489 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 20:13:32,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2296710.0, ans=0.0 2024-08-13 20:13:35,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2296810.0, ans=0.0 2024-08-13 20:13:36,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12300, loss[loss=0.1217, beats_loss=0.009428, ecapa_loss=0.0001646, whisper_loss=0.1106, over 19376.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001614, whisper_loss=0.09169, over 3886936.43 frames. ], batch size: 77, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:13:40,870 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 20:13:45,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2296810.0, ans=0.125 2024-08-13 20:13:49,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2296810.0, ans=0.2 2024-08-13 20:13:54,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2296910.0, ans=0.2 2024-08-13 20:14:01,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2296910.0, ans=15.0 2024-08-13 20:14:08,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2297010.0, ans=15.0 2024-08-13 20:14:42,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2297210.0, ans=0.0 2024-08-13 20:14:49,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-13 20:14:55,161 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12350, loss[loss=0.1031, beats_loss=0.01147, ecapa_loss=0.0001569, whisper_loss=0.09007, over 15418.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01063, ecapa_loss=0.0001606, whisper_loss=0.09213, over 3900341.56 frames. ], batch size: 59, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:14:59,451 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 20:15:18,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.416e+01 2.631e+01 3.029e+01 4.449e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-13 20:15:34,295 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 20:15:36,740 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 20:15:47,101 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 20:15:53,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2297610.0, ans=0.2 2024-08-13 20:15:57,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2297610.0, ans=0.125 2024-08-13 20:15:58,607 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-13 20:16:13,158 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 20:16:18,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2297810.0, ans=0.125 2024-08-13 20:16:18,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2297810.0, ans=0.05 2024-08-13 20:16:18,954 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12400, loss[loss=0.1003, beats_loss=0.009303, ecapa_loss=0.0002197, whisper_loss=0.08877, over 14984.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001607, whisper_loss=0.09199, over 3918167.74 frames. ], batch size: 62, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:16:19,128 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 20:16:33,523 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 20:16:34,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2297910.0, ans=0.0 2024-08-13 20:16:44,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-13 20:16:50,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2298010.0, ans=0.5 2024-08-13 20:16:59,335 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 20:17:14,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2298110.0, ans=0.125 2024-08-13 20:17:14,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=22.5 2024-08-13 20:17:15,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2298110.0, ans=0.125 2024-08-13 20:17:27,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2298210.0, ans=0.2 2024-08-13 20:17:36,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12450, loss[loss=0.08295, beats_loss=0.01065, ecapa_loss=0.0001839, whisper_loss=0.07046, over 20276.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001613, whisper_loss=0.09114, over 3905280.30 frames. ], batch size: 85, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:17:40,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.63 vs. limit=15.0 2024-08-13 20:17:57,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.445e+01 2.805e+01 3.307e+01 1.075e+02, threshold=5.611e+01, percent-clipped=1.0 2024-08-13 20:18:00,898 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 20:18:12,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2298510.0, ans=0.125 2024-08-13 20:18:14,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2298510.0, ans=0.1 2024-08-13 20:18:18,998 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 20:18:53,373 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12500, loss[loss=0.1119, beats_loss=0.008945, ecapa_loss=0.0001561, whisper_loss=0.1014, over 22683.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001619, whisper_loss=0.09138, over 3881887.36 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:18:54,829 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.316e-01 2024-08-13 20:19:05,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2298810.0, ans=0.125 2024-08-13 20:19:08,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2298810.0, ans=0.0 2024-08-13 20:19:22,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2298910.0, ans=0.125 2024-08-13 20:19:27,176 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 33 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 20:19:53,464 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:19:59,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2299210.0, ans=0.0 2024-08-13 20:20:06,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2299210.0, ans=0.09899494936611666 2024-08-13 20:20:13,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12550, loss[loss=0.1118, beats_loss=0.01162, ecapa_loss=0.0002, whisper_loss=0.09815, over 21706.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001617, whisper_loss=0.09122, over 3933725.84 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:20:13,841 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 20:20:28,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-08-13 20:20:37,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.461e+01 2.791e+01 3.123e+01 5.243e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 20:20:38,199 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-13 20:20:45,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2299510.0, ans=0.125 2024-08-13 20:20:45,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=12.0 2024-08-13 20:20:56,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2299510.0, ans=0.2 2024-08-13 20:20:57,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-13 20:21:14,677 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 20:21:32,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12600, loss[loss=0.1034, beats_loss=0.01111, ecapa_loss=0.0001729, whisper_loss=0.09058, over 22787.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001599, whisper_loss=0.09165, over 3933365.91 frames. ], batch size: 93, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:21:33,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2299810.0, ans=0.0 2024-08-13 20:21:49,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2299910.0, ans=0.125 2024-08-13 20:22:05,041 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 20:22:18,127 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 20:22:50,737 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12650, loss[loss=0.1103, beats_loss=0.01196, ecapa_loss=0.0001406, whisper_loss=0.09693, over 23231.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001601, whisper_loss=0.09159, over 3932406.58 frames. ], batch size: 90, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:23:09,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2300410.0, ans=0.0 2024-08-13 20:23:13,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.379e+01 2.634e+01 2.946e+01 5.512e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 20:23:43,328 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 20:23:53,882 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 20:24:01,527 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 35 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 20:24:02,760 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 20:24:07,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12700, loss[loss=0.0824, beats_loss=0.01238, ecapa_loss=0.0001411, whisper_loss=0.06861, over 16776.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01088, ecapa_loss=0.0001597, whisper_loss=0.0912, over 3941462.57 frames. ], batch size: 66, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:24:13,691 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 20:24:27,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-13 20:24:28,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2300910.0, ans=0.125 2024-08-13 20:24:51,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2301010.0, ans=0.2 2024-08-13 20:24:53,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2301010.0, ans=0.0 2024-08-13 20:24:55,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.49 vs. limit=22.5 2024-08-13 20:24:56,439 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.651e+01 2024-08-13 20:25:13,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-08-13 20:25:26,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12750, loss[loss=0.1133, beats_loss=0.01173, ecapa_loss=0.0001344, whisper_loss=0.1003, over 23628.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001607, whisper_loss=0.09165, over 3931273.25 frames. ], batch size: 92, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:25:30,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-13 20:25:33,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-13 20:25:50,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.318e+01 2.587e+01 2.901e+01 2.435e+02, threshold=5.175e+01, percent-clipped=0.0 2024-08-13 20:26:12,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2301610.0, ans=10.0 2024-08-13 20:26:15,377 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 20:26:19,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2301610.0, ans=0.2 2024-08-13 20:26:26,483 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 20:26:38,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-13 20:26:40,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2301710.0, ans=0.0 2024-08-13 20:26:45,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12800, loss[loss=0.09413, beats_loss=0.01369, ecapa_loss=0.0001708, whisper_loss=0.07874, over 21732.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001615, whisper_loss=0.0918, over 3917178.75 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:26:46,902 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 20:26:48,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2301810.0, ans=0.0 2024-08-13 20:27:00,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2301910.0, ans=0.125 2024-08-13 20:27:14,311 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 20:27:23,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2302010.0, ans=0.125 2024-08-13 20:27:33,015 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 20:27:52,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2302210.0, ans=0.125 2024-08-13 20:28:03,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12850, loss[loss=0.08623, beats_loss=0.009843, ecapa_loss=0.0001597, whisper_loss=0.07479, over 14051.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.011, ecapa_loss=0.0001623, whisper_loss=0.09013, over 3899642.41 frames. ], batch size: 54, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:28:26,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.352e+01 2.567e+01 2.932e+01 5.459e+01, threshold=5.134e+01, percent-clipped=2.0 2024-08-13 20:28:29,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2302410.0, ans=0.0 2024-08-13 20:28:37,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2302510.0, ans=0.1 2024-08-13 20:28:57,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2302610.0, ans=0.1 2024-08-13 20:29:03,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2302710.0, ans=0.0 2024-08-13 20:29:10,888 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 20:29:20,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12900, loss[loss=0.08794, beats_loss=0.01111, ecapa_loss=0.0002005, whisper_loss=0.07482, over 16622.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01094, ecapa_loss=0.0001625, whisper_loss=0.09078, over 3869610.73 frames. ], batch size: 71, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:29:51,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-08-13 20:29:52,396 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 20:29:54,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2303010.0, ans=0.125 2024-08-13 20:29:56,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2303010.0, ans=0.0 2024-08-13 20:30:25,128 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:30:38,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.29 vs. limit=22.5 2024-08-13 20:30:39,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 12950, loss[loss=0.1079, beats_loss=0.009254, ecapa_loss=0.0002011, whisper_loss=0.09662, over 21910.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.000163, whisper_loss=0.09119, over 3882874.55 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:31:01,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.283e+01 2.671e+01 2.992e+01 6.489e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 20:31:06,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2024-08-13 20:31:59,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13000, loss[loss=0.1102, beats_loss=0.0112, ecapa_loss=0.0001158, whisper_loss=0.09788, over 15278.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001624, whisper_loss=0.09152, over 3891155.08 frames. ], batch size: 56, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:32:04,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2303810.0, ans=0.1 2024-08-13 20:32:20,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2303910.0, ans=0.2 2024-08-13 20:32:23,033 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 20:32:24,770 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 20:32:48,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-13 20:32:57,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2304110.0, ans=0.125 2024-08-13 20:33:11,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2304210.0, ans=0.0 2024-08-13 20:33:12,552 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 20:33:23,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13050, loss[loss=0.1013, beats_loss=0.01083, ecapa_loss=0.0001681, whisper_loss=0.08874, over 17996.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01096, ecapa_loss=0.0001615, whisper_loss=0.09061, over 3871184.69 frames. ], batch size: 72, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:33:26,155 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 20:33:27,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-13 20:33:32,713 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.687e-02 2024-08-13 20:33:53,531 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 20:33:55,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.312e+01 2.609e+01 2.956e+01 5.975e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-13 20:34:15,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2304510.0, ans=0.125 2024-08-13 20:34:17,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=12.0 2024-08-13 20:34:20,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2304510.0, ans=0.125 2024-08-13 20:35:16,066 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13100, loss[loss=0.09588, beats_loss=0.01138, ecapa_loss=0.0001479, whisper_loss=0.08302, over 17840.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01097, ecapa_loss=0.0001606, whisper_loss=0.09017, over 3876293.27 frames. ], batch size: 73, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:35:36,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=15.0 2024-08-13 20:35:42,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-13 20:35:54,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2305010.0, ans=0.125 2024-08-13 20:36:00,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2305010.0, ans=0.035 2024-08-13 20:36:26,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2305110.0, ans=0.1 2024-08-13 20:36:40,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-08-13 20:36:45,803 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 20:37:02,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13150, loss[loss=0.1281, beats_loss=0.008272, ecapa_loss=0.0001529, whisper_loss=0.1183, over 23542.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01094, ecapa_loss=0.0001594, whisper_loss=0.09064, over 3914984.72 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:37:27,010 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 20:37:39,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.460e+01 2.677e+01 2.995e+01 4.365e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 20:37:45,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2305410.0, ans=0.2 2024-08-13 20:37:54,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-13 20:38:00,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2305510.0, ans=0.125 2024-08-13 20:38:01,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2305510.0, ans=0.0 2024-08-13 20:38:08,164 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 13 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 20:38:27,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2305610.0, ans=0.0 2024-08-13 20:38:44,745 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 20:38:51,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2305710.0, ans=0.2 2024-08-13 20:38:53,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=12.0 2024-08-13 20:39:04,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13200, loss[loss=0.1196, beats_loss=0.009151, ecapa_loss=0.0002026, whisper_loss=0.1084, over 21989.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001612, whisper_loss=0.09151, over 3875071.03 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:39:08,890 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 20:39:11,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2305810.0, ans=0.125 2024-08-13 20:39:23,756 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 20:39:53,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2305910.0, ans=0.1 2024-08-13 20:39:56,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2306010.0, ans=0.1 2024-08-13 20:40:47,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2306210.0, ans=0.125 2024-08-13 20:41:02,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2306210.0, ans=0.2 2024-08-13 20:41:12,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13250, loss[loss=0.1195, beats_loss=0.009944, ecapa_loss=0.0001662, whisper_loss=0.1079, over 20231.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001619, whisper_loss=0.09103, over 3868348.53 frames. ], batch size: 80, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:41:22,992 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 20:41:34,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-13 20:41:48,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.383e+01 2.626e+01 2.998e+01 4.392e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 20:42:30,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2306610.0, ans=0.5 2024-08-13 20:42:36,219 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-13 20:42:41,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2306710.0, ans=0.0 2024-08-13 20:42:42,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2306710.0, ans=0.0 2024-08-13 20:42:44,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2306710.0, ans=0.125 2024-08-13 20:42:52,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2306810.0, ans=0.125 2024-08-13 20:42:53,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13300, loss[loss=0.09917, beats_loss=0.01018, ecapa_loss=0.0001828, whisper_loss=0.08716, over 17738.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001625, whisper_loss=0.09117, over 3877552.61 frames. ], batch size: 73, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:42:55,033 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 20:43:01,651 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08281490951776505, model_norm_threshold=52.51310729980469 2024-08-13 20:43:01,873 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.983e+04, grad_sumsq=6.983e+04, orig_rms_sq=1.000e+00 2024-08-13 20:43:08,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=12.0 2024-08-13 20:43:19,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2306910.0, ans=0.1 2024-08-13 20:43:21,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2306910.0, ans=0.2 2024-08-13 20:43:25,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2307010.0, ans=0.2 2024-08-13 20:43:45,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2307110.0, ans=0.1 2024-08-13 20:43:49,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2307110.0, ans=0.125 2024-08-13 20:43:49,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2307110.0, ans=0.07 2024-08-13 20:44:01,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2307210.0, ans=0.125 2024-08-13 20:44:13,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13350, loss[loss=0.103, beats_loss=0.01007, ecapa_loss=0.0002111, whisper_loss=0.09084, over 20960.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001632, whisper_loss=0.09156, over 3885023.37 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:44:23,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2307310.0, ans=0.125 2024-08-13 20:44:28,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2307310.0, ans=0.0 2024-08-13 20:44:36,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2024-08-13 20:44:36,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.44 vs. limit=15.0 2024-08-13 20:44:38,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.444e+01 2.749e+01 3.154e+01 6.341e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 20:44:44,038 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 20:44:52,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2307510.0, ans=0.125 2024-08-13 20:44:59,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2307510.0, ans=0.125 2024-08-13 20:45:17,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2307710.0, ans=0.125 2024-08-13 20:45:33,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2307810.0, ans=0.125 2024-08-13 20:45:34,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13400, loss[loss=0.09741, beats_loss=0.01275, ecapa_loss=0.0001481, whisper_loss=0.08318, over 21185.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001634, whisper_loss=0.09193, over 3911607.48 frames. ], batch size: 84, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:45:45,421 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 20:45:46,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2307810.0, ans=0.0 2024-08-13 20:45:47,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2307810.0, ans=0.1 2024-08-13 20:45:53,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2307910.0, ans=0.125 2024-08-13 20:45:55,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2307910.0, ans=0.125 2024-08-13 20:45:55,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.73 vs. limit=10.0 2024-08-13 20:45:58,097 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 20:46:00,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2307910.0, ans=0.125 2024-08-13 20:46:07,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2308010.0, ans=0.125 2024-08-13 20:46:15,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2308010.0, ans=0.1 2024-08-13 20:46:17,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-13 20:46:20,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-13 20:46:21,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2308110.0, ans=0.0 2024-08-13 20:46:23,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2308110.0, ans=0.125 2024-08-13 20:46:23,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2308110.0, ans=0.0 2024-08-13 20:46:29,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2308110.0, ans=10.0 2024-08-13 20:46:29,953 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 20:46:30,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-13 20:46:36,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2308110.0, ans=15.0 2024-08-13 20:46:39,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2308210.0, ans=0.95 2024-08-13 20:46:40,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2308210.0, ans=0.125 2024-08-13 20:46:54,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13450, loss[loss=0.1002, beats_loss=0.01387, ecapa_loss=0.0001286, whisper_loss=0.085, over 23649.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001624, whisper_loss=0.09186, over 3908158.40 frames. ], batch size: 93, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:47:04,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2308310.0, ans=0.125 2024-08-13 20:47:11,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-13 20:47:16,810 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 20:47:17,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.358e+01 2.676e+01 3.282e+01 1.336e+02, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 20:47:20,164 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-13 20:47:28,051 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 20:47:53,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=12.0 2024-08-13 20:48:02,337 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 20:48:13,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13500, loss[loss=0.1252, beats_loss=0.007621, ecapa_loss=0.0001905, whisper_loss=0.1156, over 17153.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01066, ecapa_loss=0.0001634, whisper_loss=0.0925, over 3916858.57 frames. ], batch size: 66, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:48:27,765 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:48:47,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2309010.0, ans=0.0 2024-08-13 20:48:59,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2309010.0, ans=0.015 2024-08-13 20:49:05,207 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 20:49:17,769 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 20:49:20,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2309210.0, ans=0.1 2024-08-13 20:49:34,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2309310.0, ans=0.1 2024-08-13 20:49:36,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13550, loss[loss=0.113, beats_loss=0.0117, ecapa_loss=0.0001568, whisper_loss=0.09972, over 22678.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01069, ecapa_loss=0.0001642, whisper_loss=0.0925, over 3903607.48 frames. ], batch size: 92, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:49:40,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2309310.0, ans=0.125 2024-08-13 20:50:00,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.439e+01 2.648e+01 3.076e+01 1.090e+02, threshold=5.296e+01, percent-clipped=1.0 2024-08-13 20:50:08,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2309510.0, ans=0.0 2024-08-13 20:50:41,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2309710.0, ans=0.125 2024-08-13 20:50:56,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13600, loss[loss=0.1016, beats_loss=0.009491, ecapa_loss=0.0001892, whisper_loss=0.09018, over 19453.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001629, whisper_loss=0.09211, over 3897668.02 frames. ], batch size: 79, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:51:06,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2309810.0, ans=0.125 2024-08-13 20:51:11,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2309910.0, ans=0.125 2024-08-13 20:51:12,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2309910.0, ans=0.0 2024-08-13 20:51:18,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-13 20:51:28,702 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.052e+00 2024-08-13 20:51:33,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 20:51:44,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2310110.0, ans=0.0 2024-08-13 20:51:45,739 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 20:51:54,383 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.434e+01 2024-08-13 20:51:54,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-08-13 20:51:55,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2310110.0, ans=0.125 2024-08-13 20:52:00,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-13 20:52:01,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2310210.0, ans=0.0 2024-08-13 20:52:04,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2310210.0, ans=0.125 2024-08-13 20:52:09,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2310210.0, ans=0.125 2024-08-13 20:52:10,314 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 20:52:15,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13650, loss[loss=0.08774, beats_loss=0.01382, ecapa_loss=0.0001347, whisper_loss=0.07257, over 22797.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001621, whisper_loss=0.09124, over 3905607.39 frames. ], batch size: 93, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:52:23,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2310310.0, ans=0.125 2024-08-13 20:52:26,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2310310.0, ans=0.07 2024-08-13 20:52:37,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.402e+01 2.676e+01 3.034e+01 5.771e+01, threshold=5.352e+01, percent-clipped=1.0 2024-08-13 20:53:05,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2310610.0, ans=0.09899494936611666 2024-08-13 20:53:06,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2024-08-13 20:53:14,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2310710.0, ans=0.125 2024-08-13 20:53:20,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2310710.0, ans=0.1 2024-08-13 20:53:30,197 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13700, loss[loss=0.1287, beats_loss=0.01021, ecapa_loss=0.0001448, whisper_loss=0.1171, over 22954.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001616, whisper_loss=0.09127, over 3911569.40 frames. ], batch size: 88, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:54:02,018 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 20:54:10,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2311010.0, ans=0.1 2024-08-13 20:54:16,682 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 20:54:17,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2311110.0, ans=0.1 2024-08-13 20:54:19,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2311110.0, ans=0.125 2024-08-13 20:54:38,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2311210.0, ans=0.125 2024-08-13 20:54:43,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13750, loss[loss=0.08982, beats_loss=0.008262, ecapa_loss=0.0001277, whisper_loss=0.08028, over 15817.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001608, whisper_loss=0.09168, over 3909681.37 frames. ], batch size: 57, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:54:52,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2311310.0, ans=0.09899494936611666 2024-08-13 20:54:59,127 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 20:55:01,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-13 20:55:05,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.299e+01 2.662e+01 2.929e+01 4.195e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 20:55:15,002 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:55:16,011 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 20:55:42,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2311710.0, ans=0.0 2024-08-13 20:55:53,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2311710.0, ans=0.125 2024-08-13 20:55:57,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13800, loss[loss=0.08296, beats_loss=0.01076, ecapa_loss=0.0001725, whisper_loss=0.07047, over 15748.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001624, whisper_loss=0.09169, over 3866910.42 frames. ], batch size: 65, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:55:58,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2311810.0, ans=0.05 2024-08-13 20:56:31,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2312010.0, ans=0.0 2024-08-13 20:56:44,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-13 20:56:46,641 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 20:56:58,427 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 20:57:06,362 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 20:57:08,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2312310.0, ans=0.1 2024-08-13 20:57:08,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13850, loss[loss=0.09239, beats_loss=0.009053, ecapa_loss=0.0001712, whisper_loss=0.08162, over 13972.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001627, whisper_loss=0.09188, over 3867068.91 frames. ], batch size: 56, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:57:16,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2312310.0, ans=0.2 2024-08-13 20:57:25,676 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 20:57:29,794 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-13 20:57:31,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.385e+01 2.789e+01 3.325e+01 4.881e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 20:57:33,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-13 20:58:10,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2312710.0, ans=0.125 2024-08-13 20:58:18,075 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 20:58:21,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2312810.0, ans=0.0 2024-08-13 20:58:21,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13900, loss[loss=0.1132, beats_loss=0.009809, ecapa_loss=0.0001791, whisper_loss=0.1016, over 19547.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01073, ecapa_loss=0.0001622, whisper_loss=0.09229, over 3909946.32 frames. ], batch size: 78, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:58:32,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-13 20:59:34,197 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 13950, loss[loss=0.08622, beats_loss=0.01267, ecapa_loss=0.0001416, whisper_loss=0.07214, over 21206.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001616, whisper_loss=0.09178, over 3903498.12 frames. ], batch size: 88, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:59:47,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2313410.0, ans=0.125 2024-08-13 20:59:56,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.456e+01 2.773e+01 3.183e+01 5.275e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 20:59:57,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2313410.0, ans=0.0 2024-08-13 21:00:00,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2313410.0, ans=0.025 2024-08-13 21:00:07,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2313510.0, ans=0.0 2024-08-13 21:00:09,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2313510.0, ans=0.1 2024-08-13 21:00:16,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2313510.0, ans=0.125 2024-08-13 21:00:23,539 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 21:00:23,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2313610.0, ans=0.0 2024-08-13 21:00:27,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2313610.0, ans=0.015 2024-08-13 21:00:33,089 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-13 21:00:40,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2313710.0, ans=0.1 2024-08-13 21:00:48,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14000, loss[loss=0.1072, beats_loss=0.01031, ecapa_loss=0.0001621, whisper_loss=0.09524, over 19492.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001598, whisper_loss=0.09159, over 3941715.41 frames. ], batch size: 80, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:00:59,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2313810.0, ans=0.125 2024-08-13 21:01:00,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2313810.0, ans=0.0 2024-08-13 21:01:12,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2313910.0, ans=0.0 2024-08-13 21:01:39,958 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-13 21:01:46,216 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 21:01:50,758 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 21:02:02,957 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14050, loss[loss=0.1069, beats_loss=0.009939, ecapa_loss=0.0001802, whisper_loss=0.09518, over 20994.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001588, whisper_loss=0.09135, over 3916877.05 frames. ], batch size: 86, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:02:12,537 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 21:02:24,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.387e+01 2.626e+01 2.873e+01 4.572e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 21:02:24,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2314410.0, ans=0.09899494936611666 2024-08-13 21:02:29,910 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 21:02:37,505 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 21:02:55,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2314610.0, ans=0.95 2024-08-13 21:03:14,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.97 vs. limit=22.5 2024-08-13 21:03:15,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14100, loss[loss=0.0743, beats_loss=0.01288, ecapa_loss=0.0001821, whisper_loss=0.0596, over 21569.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001583, whisper_loss=0.09191, over 3901326.77 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:03:15,201 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 21:03:25,068 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-13 21:03:28,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2314910.0, ans=0.0 2024-08-13 21:03:30,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2314910.0, ans=0.0 2024-08-13 21:03:36,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2314910.0, ans=0.125 2024-08-13 21:03:42,425 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 21:03:42,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2314910.0, ans=0.0 2024-08-13 21:03:46,383 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 21:03:52,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2315010.0, ans=0.0 2024-08-13 21:04:07,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2315110.0, ans=0.5 2024-08-13 21:04:15,671 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 21:04:23,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2315210.0, ans=0.0 2024-08-13 21:04:26,861 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14150, loss[loss=0.08718, beats_loss=0.01083, ecapa_loss=0.0001676, whisper_loss=0.07467, over 22487.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001597, whisper_loss=0.09103, over 3906762.49 frames. ], batch size: 92, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:04:30,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2315310.0, ans=0.125 2024-08-13 21:04:31,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2315310.0, ans=0.1 2024-08-13 21:04:32,396 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-13 21:04:42,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2315410.0, ans=0.2 2024-08-13 21:04:46,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2315410.0, ans=0.125 2024-08-13 21:04:49,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.465e+01 2.633e+01 2.994e+01 4.985e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 21:04:51,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2315410.0, ans=0.1 2024-08-13 21:05:41,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14200, loss[loss=0.1119, beats_loss=0.01024, ecapa_loss=0.0001464, whisper_loss=0.1002, over 23261.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001598, whisper_loss=0.09114, over 3919631.41 frames. ], batch size: 92, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:05:56,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-13 21:05:58,955 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 21:05:59,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.82 vs. limit=6.0 2024-08-13 21:06:00,760 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 21:06:04,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=12.0 2024-08-13 21:06:09,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2315910.0, ans=0.0 2024-08-13 21:06:25,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=12.0 2024-08-13 21:06:27,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2316110.0, ans=0.0 2024-08-13 21:06:27,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2316110.0, ans=0.0 2024-08-13 21:06:43,288 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:06:54,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2316210.0, ans=0.0 2024-08-13 21:07:00,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14250, loss[loss=0.114, beats_loss=0.01024, ecapa_loss=0.0001765, whisper_loss=0.102, over 19476.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001594, whisper_loss=0.09139, over 3928452.56 frames. ], batch size: 77, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:07:20,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2316410.0, ans=0.125 2024-08-13 21:07:24,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.489e+01 2.737e+01 3.188e+01 4.877e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-13 21:07:26,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2316410.0, ans=0.125 2024-08-13 21:07:28,568 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.550e+01 2024-08-13 21:07:30,569 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-13 21:07:52,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2316610.0, ans=0.0 2024-08-13 21:08:05,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2316710.0, ans=0.0 2024-08-13 21:08:06,822 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 21:08:08,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2316710.0, ans=0.125 2024-08-13 21:08:11,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2316710.0, ans=0.125 2024-08-13 21:08:17,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14300, loss[loss=0.1065, beats_loss=0.01109, ecapa_loss=0.0001726, whisper_loss=0.09373, over 20203.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001604, whisper_loss=0.09093, over 3948555.99 frames. ], batch size: 81, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:08:26,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-13 21:08:29,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2316810.0, ans=0.1 2024-08-13 21:08:34,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2316910.0, ans=0.125 2024-08-13 21:08:38,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-13 21:08:53,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2317010.0, ans=0.125 2024-08-13 21:09:02,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-08-13 21:09:13,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2317110.0, ans=0.0 2024-08-13 21:09:33,244 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14350, loss[loss=0.09403, beats_loss=0.01111, ecapa_loss=0.0001729, whisper_loss=0.08119, over 17730.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001608, whisper_loss=0.09072, over 3926914.16 frames. ], batch size: 71, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:09:44,125 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 11 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 21:09:46,659 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 21:09:56,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.382e+01 2.716e+01 3.017e+01 1.009e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:09:58,850 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 21:10:02,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2317510.0, ans=0.0 2024-08-13 21:10:15,673 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 21:10:33,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2317710.0, ans=0.1 2024-08-13 21:10:33,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=12.0 2024-08-13 21:10:37,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2317710.0, ans=0.0 2024-08-13 21:10:49,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14400, loss[loss=0.1037, beats_loss=0.0123, ecapa_loss=0.0001485, whisper_loss=0.08994, over 22768.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.000161, whisper_loss=0.09106, over 3930429.63 frames. ], batch size: 92, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:10:53,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2317810.0, ans=0.125 2024-08-13 21:11:10,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2317910.0, ans=0.0 2024-08-13 21:11:15,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2317910.0, ans=0.0 2024-08-13 21:11:31,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2024-08-13 21:11:42,802 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 21:11:49,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-13 21:11:53,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2318210.0, ans=0.1 2024-08-13 21:12:05,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 16, batch 14450, loss[loss=0.09976, beats_loss=0.01131, ecapa_loss=0.0001663, whisper_loss=0.08679, over 22065.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001616, whisper_loss=0.09141, over 3928027.64 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:12:11,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2318310.0, ans=0.125 2024-08-13 21:12:16,540 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 21:12:28,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-08-13 21:12:28,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.376e+01 2.680e+01 3.028e+01 6.046e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-13 21:12:33,840 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 21:12:45,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2318510.0, ans=0.0 2024-08-13 21:12:45,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2318510.0, ans=0.0 2024-08-13 21:12:52,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2318610.0, ans=0.125 2024-08-13 21:12:53,799 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 21:12:54,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2318610.0, ans=0.2 2024-08-13 21:12:56,644 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 21:13:47,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 0, loss[loss=0.07867, beats_loss=0.01345, ecapa_loss=0.0001466, whisper_loss=0.06376, over 18812.00 frames. ], tot_loss[loss=0.07867, beats_loss=0.01345, ecapa_loss=0.0001466, whisper_loss=0.06376, over 18812.00 frames. ], batch size: 76, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:13:47,650 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 21:14:29,919 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2478, over 922467.00 frames. 2024-08-13 21:14:46,255 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on SV_voxceleb1: loss=0.004509, beats_loss=0, ecapa_loss=0.0004509, whisper_loss=0, over 939242.00 frames. 2024-08-13 21:16:46,391 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on AT_audioset: loss=0.02361, beats_loss=0.02361, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 21:16:46,395 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 21:16:51,099 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:17:04,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:24,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2318830.0, ans=0.125 2024-08-13 21:17:35,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2318830.0, ans=0.125 2024-08-13 21:17:48,589 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 21:18:14,795 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 21:18:23,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-08-13 21:18:58,533 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 50, loss[loss=0.1088, beats_loss=0.008814, ecapa_loss=0.0001812, whisper_loss=0.09822, over 20995.00 frames. ], tot_loss[loss=0.09931, beats_loss=0.01022, ecapa_loss=0.0001693, whisper_loss=0.08739, over 865823.46 frames. ], batch size: 84, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:19:00,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-13 21:19:29,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2319330.0, ans=0.125 2024-08-13 21:19:46,262 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 21:19:55,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.701e+01 3.109e+01 3.430e+01 6.788e+01, threshold=6.217e+01, percent-clipped=2.0 2024-08-13 21:20:01,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2319430.0, ans=0.0 2024-08-13 21:20:04,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-08-13 21:20:20,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2319530.0, ans=0.125 2024-08-13 21:21:00,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 100, loss[loss=0.09836, beats_loss=0.01221, ecapa_loss=0.0001527, whisper_loss=0.08463, over 20561.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01, ecapa_loss=0.000163, whisper_loss=0.08945, over 1544033.66 frames. ], batch size: 82, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:21:21,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2319730.0, ans=0.1 2024-08-13 21:21:21,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2319730.0, ans=0.5 2024-08-13 21:21:27,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=12.0 2024-08-13 21:21:32,069 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 21:21:46,056 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 21:21:48,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2319930.0, ans=0.125 2024-08-13 21:21:53,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2024-08-13 21:22:04,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2319930.0, ans=0.2 2024-08-13 21:22:08,665 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 21:22:23,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-13 21:22:30,938 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.095e+05 2024-08-13 21:22:38,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2320130.0, ans=0.125 2024-08-13 21:22:40,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2024-08-13 21:22:41,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2024-08-13 21:22:51,653 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 21:22:52,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 150, loss[loss=0.1002, beats_loss=0.01025, ecapa_loss=0.0001572, whisper_loss=0.08835, over 18398.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01001, ecapa_loss=0.0001613, whisper_loss=0.08979, over 2034600.00 frames. ], batch size: 73, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:23:32,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.591e+01 2.910e+01 3.226e+01 4.259e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-13 21:23:42,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2320530.0, ans=0.1 2024-08-13 21:23:46,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-08-13 21:23:50,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-13 21:24:00,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2320630.0, ans=0.2 2024-08-13 21:24:06,803 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 21:24:12,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2320630.0, ans=0.125 2024-08-13 21:24:16,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 200, loss[loss=0.1119, beats_loss=0.01166, ecapa_loss=0.0001741, whisper_loss=0.09851, over 18200.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01014, ecapa_loss=0.0001624, whisper_loss=0.09037, over 2405981.12 frames. ], batch size: 70, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:24:17,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2320730.0, ans=0.1 2024-08-13 21:24:29,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2320730.0, ans=0.0 2024-08-13 21:24:54,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2320930.0, ans=0.2 2024-08-13 21:25:18,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2321030.0, ans=0.05 2024-08-13 21:25:23,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2321130.0, ans=0.2 2024-08-13 21:25:25,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2321130.0, ans=0.125 2024-08-13 21:25:34,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2321130.0, ans=10.0 2024-08-13 21:25:38,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 250, loss[loss=0.1077, beats_loss=0.009353, ecapa_loss=0.0001471, whisper_loss=0.09686, over 16796.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01022, ecapa_loss=0.0001639, whisper_loss=0.09154, over 2724222.26 frames. ], batch size: 62, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:26:17,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.338e+01 2.625e+01 3.056e+01 3.496e+02, threshold=5.250e+01, percent-clipped=1.0 2024-08-13 21:26:26,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2321430.0, ans=0.1 2024-08-13 21:26:32,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2321530.0, ans=0.0 2024-08-13 21:26:37,255 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 21:26:40,215 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 21:27:01,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.49 vs. limit=15.0 2024-08-13 21:27:02,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 300, loss[loss=0.1003, beats_loss=0.008959, ecapa_loss=0.0001989, whisper_loss=0.08936, over 13963.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001632, whisper_loss=0.09143, over 2939698.22 frames. ], batch size: 57, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:27:26,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2321830.0, ans=0.0 2024-08-13 21:27:42,636 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 21:27:42,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2321930.0, ans=0.125 2024-08-13 21:27:42,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2321930.0, ans=0.125 2024-08-13 21:27:51,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2024-08-13 21:28:29,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 350, loss[loss=0.09616, beats_loss=0.009682, ecapa_loss=0.0001744, whisper_loss=0.08474, over 13783.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001618, whisper_loss=0.09079, over 3119041.13 frames. ], batch size: 55, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:28:40,241 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 21:28:40,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2322230.0, ans=0.0 2024-08-13 21:28:47,852 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 21:28:51,324 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 21:29:08,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.409e+01 2.739e+01 3.112e+01 5.763e+01, threshold=5.479e+01, percent-clipped=3.0 2024-08-13 21:29:24,456 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 21:29:42,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2322630.0, ans=0.1 2024-08-13 21:29:42,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2322630.0, ans=0.2 2024-08-13 21:29:57,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 400, loss[loss=0.1097, beats_loss=0.01066, ecapa_loss=0.0001866, whisper_loss=0.09715, over 18802.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001609, whisper_loss=0.09012, over 3241065.32 frames. ], batch size: 76, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:30:04,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2322730.0, ans=0.125 2024-08-13 21:30:09,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-13 21:30:11,583 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 21:30:25,537 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 32 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 21:30:28,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2322830.0, ans=0.125 2024-08-13 21:30:30,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2322830.0, ans=0.1 2024-08-13 21:30:32,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2322930.0, ans=0.1 2024-08-13 21:30:42,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2322930.0, ans=0.0 2024-08-13 21:31:12,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2323130.0, ans=0.0 2024-08-13 21:31:17,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2323130.0, ans=0.125 2024-08-13 21:31:25,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 450, loss[loss=0.09593, beats_loss=0.01075, ecapa_loss=0.0001835, whisper_loss=0.08335, over 16106.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.000161, whisper_loss=0.09013, over 3373388.03 frames. ], batch size: 68, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:31:27,996 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 21:31:28,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2323230.0, ans=0.0 2024-08-13 21:31:30,303 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.751e-02 2024-08-13 21:31:52,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2024-08-13 21:31:54,288 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.112e-03 2024-08-13 21:31:57,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2323330.0, ans=0.0 2024-08-13 21:32:06,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.385e+01 2.600e+01 2.999e+01 5.733e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-13 21:32:10,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2323430.0, ans=0.125 2024-08-13 21:32:11,643 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 29 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 21:32:11,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2323430.0, ans=0.125 2024-08-13 21:32:11,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2323430.0, ans=0.125 2024-08-13 21:32:18,529 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 21:32:25,958 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 21:32:32,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-13 21:32:43,037 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:32:43,949 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 21:32:52,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 500, loss[loss=0.0823, beats_loss=0.01115, ecapa_loss=0.0001455, whisper_loss=0.06969, over 15550.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01043, ecapa_loss=0.0001602, whisper_loss=0.09142, over 3509073.76 frames. ], batch size: 61, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:33:00,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2323730.0, ans=0.125 2024-08-13 21:33:05,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2323730.0, ans=0.0 2024-08-13 21:33:15,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2323830.0, ans=0.125 2024-08-13 21:33:20,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-13 21:33:29,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2323930.0, ans=0.0 2024-08-13 21:33:41,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2324030.0, ans=0.125 2024-08-13 21:33:46,860 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 21:33:49,678 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 21:33:56,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-08-13 21:33:59,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2324130.0, ans=0.0 2024-08-13 21:34:13,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 550, loss[loss=0.105, beats_loss=0.009386, ecapa_loss=0.0001832, whisper_loss=0.09383, over 21350.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001605, whisper_loss=0.09098, over 3586059.91 frames. ], batch size: 89, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:34:13,974 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 21:34:15,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2324230.0, ans=0.1 2024-08-13 21:34:16,923 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 21:34:30,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-13 21:34:42,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=15.0 2024-08-13 21:34:50,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-13 21:34:51,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.278e+01 2.510e+01 2.744e+01 4.092e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-13 21:34:52,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=15.0 2024-08-13 21:34:58,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=12.0 2024-08-13 21:35:26,036 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 21:35:27,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2324630.0, ans=10.0 2024-08-13 21:35:30,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-13 21:35:31,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 600, loss[loss=0.1211, beats_loss=0.008737, ecapa_loss=0.0001456, whisper_loss=0.1109, over 16185.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001587, whisper_loss=0.09139, over 3629261.29 frames. ], batch size: 59, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:35:32,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2324730.0, ans=0.0 2024-08-13 21:35:34,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-13 21:35:39,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-08-13 21:35:43,574 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 21:35:49,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2324830.0, ans=0.125 2024-08-13 21:35:57,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2324830.0, ans=0.2 2024-08-13 21:36:27,332 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.286e-01 2024-08-13 21:36:28,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2325130.0, ans=0.125 2024-08-13 21:36:32,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=12.0 2024-08-13 21:36:33,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.73 vs. limit=10.0 2024-08-13 21:36:33,504 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 21:36:37,270 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 21:36:38,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 650, loss[loss=0.1031, beats_loss=0.009714, ecapa_loss=0.0001812, whisper_loss=0.09153, over 19535.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001594, whisper_loss=0.09141, over 3698779.49 frames. ], batch size: 78, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:36:55,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-08-13 21:36:58,007 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 21:37:03,226 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-13 21:37:09,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.389e+01 2.704e+01 3.013e+01 8.978e+01, threshold=5.408e+01, percent-clipped=2.0 2024-08-13 21:37:14,535 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 21:37:16,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-08-13 21:37:27,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2325530.0, ans=0.125 2024-08-13 21:37:31,872 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 21:37:33,090 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 21:37:33,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=15.0 2024-08-13 21:37:34,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2325630.0, ans=0.05 2024-08-13 21:37:38,253 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 21:37:39,663 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 21:37:43,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 700, loss[loss=0.08337, beats_loss=0.009889, ecapa_loss=0.000111, whisper_loss=0.07237, over 17151.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001588, whisper_loss=0.09122, over 3722437.77 frames. ], batch size: 62, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:37:51,810 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 21:38:39,595 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 21:38:40,972 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 21:38:41,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2326130.0, ans=0.1 2024-08-13 21:38:46,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2326130.0, ans=0.0 2024-08-13 21:38:48,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 750, loss[loss=0.07697, beats_loss=0.0127, ecapa_loss=0.0001558, whisper_loss=0.06271, over 13575.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01059, ecapa_loss=0.0001581, whisper_loss=0.09172, over 3762343.85 frames. ], batch size: 53, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:39:19,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.349e+01 2.525e+01 2.805e+01 4.000e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-13 21:39:45,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-13 21:39:54,222 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 800, loss[loss=0.0909, beats_loss=0.0109, ecapa_loss=0.0001471, whisper_loss=0.07853, over 14348.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.000159, whisper_loss=0.09099, over 3749423.81 frames. ], batch size: 56, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:39:54,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2326730.0, ans=0.1 2024-08-13 21:40:00,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2326730.0, ans=0.0 2024-08-13 21:40:14,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2326830.0, ans=0.125 2024-08-13 21:40:24,405 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 21:40:33,311 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 21:40:42,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2327030.0, ans=0.125 2024-08-13 21:40:43,612 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-13 21:40:48,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-13 21:40:59,025 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 850, loss[loss=0.1231, beats_loss=0.009177, ecapa_loss=0.0001349, whisper_loss=0.1126, over 16210.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001578, whisper_loss=0.09044, over 3764512.44 frames. ], batch size: 59, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:41:04,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2024-08-13 21:41:13,757 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 21:41:24,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2327430.0, ans=0.1 2024-08-13 21:41:26,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-13 21:41:30,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.387e+01 2.596e+01 3.123e+01 5.757e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-13 21:41:37,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-13 21:41:40,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2327530.0, ans=0.1 2024-08-13 21:41:49,046 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 21:41:49,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.57 vs. limit=6.0 2024-08-13 21:41:52,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2327630.0, ans=0.125 2024-08-13 21:41:54,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2327630.0, ans=0.2 2024-08-13 21:42:04,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 900, loss[loss=0.092, beats_loss=0.01458, ecapa_loss=0.0001414, whisper_loss=0.076, over 18725.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.000157, whisper_loss=0.09024, over 3763750.27 frames. ], batch size: 75, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:42:23,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2327830.0, ans=0.1 2024-08-13 21:42:37,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2327930.0, ans=0.0 2024-08-13 21:42:37,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2327930.0, ans=0.2 2024-08-13 21:42:44,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2328030.0, ans=0.07 2024-08-13 21:42:54,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2328030.0, ans=0.125 2024-08-13 21:43:06,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2328130.0, ans=0.125 2024-08-13 21:43:09,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 950, loss[loss=0.09397, beats_loss=0.01259, ecapa_loss=0.000149, whisper_loss=0.07989, over 22771.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001567, whisper_loss=0.09005, over 3798116.16 frames. ], batch size: 92, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:43:12,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2328230.0, ans=0.2 2024-08-13 21:43:15,185 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 21:43:15,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2328230.0, ans=0.125 2024-08-13 21:43:23,791 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 21:43:41,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.367e+01 2.632e+01 3.016e+01 5.732e+01, threshold=5.263e+01, percent-clipped=3.0 2024-08-13 21:43:53,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2328530.0, ans=0.0 2024-08-13 21:44:00,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2328530.0, ans=0.0 2024-08-13 21:44:03,312 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 21:44:15,217 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1000, loss[loss=0.1061, beats_loss=0.01275, ecapa_loss=0.0001795, whisper_loss=0.09157, over 22971.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001565, whisper_loss=0.09014, over 3783329.64 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:44:23,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2328730.0, ans=0.125 2024-08-13 21:44:28,483 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 21:44:28,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2024-08-13 21:44:30,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2328830.0, ans=0.125 2024-08-13 21:44:51,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-08-13 21:45:21,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1050, loss[loss=0.1126, beats_loss=0.0113, ecapa_loss=0.0001091, whisper_loss=0.1002, over 21886.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001556, whisper_loss=0.09035, over 3789711.46 frames. ], batch size: 79, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:45:37,792 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 21:45:52,012 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.420e+01 2.664e+01 2.978e+01 4.899e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 21:45:56,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-13 21:46:08,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-13 21:46:09,273 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 21:46:15,794 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 21:46:26,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1100, loss[loss=0.09512, beats_loss=0.01109, ecapa_loss=0.0001341, whisper_loss=0.08269, over 17195.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001552, whisper_loss=0.09042, over 3832449.61 frames. ], batch size: 68, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:46:26,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2329730.0, ans=0.2 2024-08-13 21:46:27,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2329730.0, ans=0.0 2024-08-13 21:46:48,883 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 21:46:56,723 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 21:46:59,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2329930.0, ans=0.125 2024-08-13 21:47:01,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2329930.0, ans=0.125 2024-08-13 21:47:02,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2329930.0, ans=0.125 2024-08-13 21:47:07,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2330030.0, ans=0.1 2024-08-13 21:47:10,112 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 21:47:16,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-08-13 21:47:21,986 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 21:47:26,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2330130.0, ans=0.125 2024-08-13 21:47:32,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1150, loss[loss=0.1019, beats_loss=0.0102, ecapa_loss=0.0001358, whisper_loss=0.09038, over 19922.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001545, whisper_loss=0.09128, over 3845012.36 frames. ], batch size: 79, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:47:34,880 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 21:47:37,650 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 21:47:40,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2330230.0, ans=0.0 2024-08-13 21:48:00,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2330430.0, ans=0.125 2024-08-13 21:48:03,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.427e+01 2.716e+01 3.117e+01 1.034e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:48:07,940 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 21:48:11,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-13 21:48:18,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2330530.0, ans=10.0 2024-08-13 21:48:29,140 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-13 21:48:35,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2330630.0, ans=0.0 2024-08-13 21:48:37,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1200, loss[loss=0.09065, beats_loss=0.01322, ecapa_loss=0.0001742, whisper_loss=0.07568, over 16034.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001555, whisper_loss=0.09131, over 3831013.93 frames. ], batch size: 69, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:48:48,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2024-08-13 21:49:15,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-08-13 21:49:17,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2331030.0, ans=0.0 2024-08-13 21:49:19,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=12.0 2024-08-13 21:49:29,173 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 21:49:43,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1250, loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001464, whisper_loss=0.09018, over 18151.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.000156, whisper_loss=0.0914, over 3842544.93 frames. ], batch size: 71, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:49:53,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2331230.0, ans=0.125 2024-08-13 21:49:58,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2331330.0, ans=0.125 2024-08-13 21:50:08,139 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 21:50:09,319 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 21:50:09,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2331430.0, ans=0.0 2024-08-13 21:50:14,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.232e+01 2.491e+01 2.766e+01 6.956e+01, threshold=4.983e+01, percent-clipped=1.0 2024-08-13 21:50:14,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2331430.0, ans=0.125 2024-08-13 21:50:25,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2331530.0, ans=0.125 2024-08-13 21:50:48,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1300, loss[loss=0.08984, beats_loss=0.01096, ecapa_loss=0.0001048, whisper_loss=0.07784, over 16071.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001556, whisper_loss=0.09135, over 3854360.20 frames. ], batch size: 59, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:51:01,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2331830.0, ans=0.125 2024-08-13 21:51:07,002 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-13 21:51:15,044 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 21:51:32,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2332030.0, ans=0.125 2024-08-13 21:51:35,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2332030.0, ans=0.125 2024-08-13 21:51:53,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2332130.0, ans=0.0 2024-08-13 21:51:56,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1350, loss[loss=0.1016, beats_loss=0.01162, ecapa_loss=0.000155, whisper_loss=0.08845, over 21381.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001564, whisper_loss=0.09098, over 3849895.96 frames. ], batch size: 85, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:52:02,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2332230.0, ans=0.0 2024-08-13 21:52:12,959 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 21:52:27,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2024-08-13 21:52:30,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.343e+01 2.685e+01 2.934e+01 4.089e+01, threshold=5.369e+01, percent-clipped=0.0 2024-08-13 21:52:31,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-13 21:52:36,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-08-13 21:52:46,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2332530.0, ans=0.1 2024-08-13 21:53:10,619 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1400, loss[loss=0.06577, beats_loss=0.01358, ecapa_loss=0.0001292, whisper_loss=0.05089, over 15902.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001577, whisper_loss=0.09089, over 3883648.97 frames. ], batch size: 64, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:53:12,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2332730.0, ans=0.125 2024-08-13 21:53:15,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2332730.0, ans=0.0 2024-08-13 21:53:21,156 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 21:53:21,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2332730.0, ans=0.1 2024-08-13 21:53:37,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2332830.0, ans=0.125 2024-08-13 21:53:42,408 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 21:53:45,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2332930.0, ans=0.1 2024-08-13 21:53:45,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-13 21:53:56,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2333030.0, ans=0.0 2024-08-13 21:54:01,468 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 21:54:20,753 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 21:54:24,823 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1450, loss[loss=0.09091, beats_loss=0.01108, ecapa_loss=0.0001267, whisper_loss=0.07857, over 21114.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001574, whisper_loss=0.09037, over 3873880.09 frames. ], batch size: 80, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:54:49,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-08-13 21:55:21,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.338e+01 2.604e+01 2.874e+01 4.710e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-13 21:55:27,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2333430.0, ans=0.125 2024-08-13 21:55:28,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2333430.0, ans=0.125 2024-08-13 21:55:42,638 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 21:55:44,370 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 32 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 21:55:46,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-13 21:55:46,987 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 21:55:54,283 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 21:56:01,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1500, loss[loss=0.1277, beats_loss=0.009649, ecapa_loss=0.0001548, whisper_loss=0.1165, over 21591.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001566, whisper_loss=0.08953, over 3875575.28 frames. ], batch size: 84, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:56:01,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2333730.0, ans=0.125 2024-08-13 21:56:14,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2333830.0, ans=0.0 2024-08-13 21:56:23,463 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 21:56:32,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2333930.0, ans=0.0 2024-08-13 21:56:42,975 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 21:56:50,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-08-13 21:57:14,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-08-13 21:57:14,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1550, loss[loss=0.1034, beats_loss=0.01043, ecapa_loss=0.0001531, whisper_loss=0.09146, over 22746.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001552, whisper_loss=0.0903, over 3874520.13 frames. ], batch size: 90, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:57:20,462 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 21:57:29,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2334330.0, ans=0.0 2024-08-13 21:57:46,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2334430.0, ans=0.1 2024-08-13 21:57:46,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2334430.0, ans=0.125 2024-08-13 21:57:51,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.314e+01 2.571e+01 2.868e+01 3.932e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-13 21:58:19,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2334630.0, ans=0.1 2024-08-13 21:58:22,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2334630.0, ans=0.125 2024-08-13 21:58:27,469 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 21:58:29,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1600, loss[loss=0.1034, beats_loss=0.01207, ecapa_loss=0.0001499, whisper_loss=0.08987, over 22961.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001551, whisper_loss=0.09041, over 3857747.23 frames. ], batch size: 91, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:58:41,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2334730.0, ans=0.125 2024-08-13 21:58:54,104 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 21:59:10,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2334930.0, ans=0.0 2024-08-13 21:59:17,265 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-13 21:59:19,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2335030.0, ans=0.125 2024-08-13 21:59:23,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2335030.0, ans=0.1 2024-08-13 21:59:23,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=12.0 2024-08-13 21:59:32,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2335130.0, ans=0.1 2024-08-13 21:59:41,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1650, loss[loss=0.05963, beats_loss=0.01227, ecapa_loss=0.0001556, whisper_loss=0.0458, over 16759.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.000154, whisper_loss=0.09016, over 3877006.77 frames. ], batch size: 71, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:59:46,178 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 21:59:46,951 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2024-08-13 21:59:55,204 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 21:59:57,937 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 22:00:04,712 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-13 22:00:08,065 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 22:00:15,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.351e+01 2.606e+01 2.894e+01 4.343e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-13 22:00:17,534 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 22:00:26,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2335530.0, ans=0.125 2024-08-13 22:00:27,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2335530.0, ans=0.125 2024-08-13 22:00:35,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2335530.0, ans=0.0 2024-08-13 22:00:45,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2335630.0, ans=0.125 2024-08-13 22:00:52,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1700, loss[loss=0.07268, beats_loss=0.01368, ecapa_loss=0.0001301, whisper_loss=0.0577, over 16569.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001538, whisper_loss=0.09064, over 3902253.05 frames. ], batch size: 68, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:00:57,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2335730.0, ans=0.1 2024-08-13 22:00:59,856 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 22:01:10,461 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 22:01:15,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2024-08-13 22:01:22,380 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 22:01:29,194 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 22:01:34,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2336030.0, ans=0.125 2024-08-13 22:01:37,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=12.0 2024-08-13 22:01:40,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2336030.0, ans=0.1 2024-08-13 22:01:51,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2336130.0, ans=0.1 2024-08-13 22:02:00,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-13 22:02:02,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1750, loss[loss=0.09917, beats_loss=0.01224, ecapa_loss=0.0001619, whisper_loss=0.08531, over 19916.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001539, whisper_loss=0.09047, over 3885797.44 frames. ], batch size: 81, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:02:04,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2336230.0, ans=0.1 2024-08-13 22:02:10,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2336230.0, ans=0.125 2024-08-13 22:02:12,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2336230.0, ans=0.1 2024-08-13 22:02:15,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2336330.0, ans=0.1 2024-08-13 22:02:25,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2336330.0, ans=0.07 2024-08-13 22:02:27,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2336330.0, ans=0.0 2024-08-13 22:02:27,892 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-13 22:02:30,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2336430.0, ans=0.0 2024-08-13 22:02:30,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2336430.0, ans=0.1 2024-08-13 22:02:35,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.335e+01 2.606e+01 3.098e+01 1.901e+02, threshold=5.212e+01, percent-clipped=3.0 2024-08-13 22:02:35,233 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 22:02:38,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2336430.0, ans=0.125 2024-08-13 22:02:50,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-13 22:03:11,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1800, loss[loss=0.1004, beats_loss=0.01061, ecapa_loss=0.0001695, whisper_loss=0.08812, over 20634.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001543, whisper_loss=0.09074, over 3882431.77 frames. ], batch size: 85, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:03:54,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2337030.0, ans=0.0 2024-08-13 22:03:58,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2337030.0, ans=0.1 2024-08-13 22:03:59,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2337030.0, ans=0.0 2024-08-13 22:04:07,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2337130.0, ans=0.09899494936611666 2024-08-13 22:04:10,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2337130.0, ans=0.0 2024-08-13 22:04:16,787 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 22:04:20,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1850, loss[loss=0.09673, beats_loss=0.01025, ecapa_loss=0.0002554, whisper_loss=0.08392, over 16182.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001545, whisper_loss=0.09039, over 3854735.02 frames. ], batch size: 71, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:04:30,136 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 22:04:45,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2024-08-13 22:04:52,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.324e+01 2.518e+01 2.718e+01 4.142e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-13 22:04:54,479 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 22:05:19,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2337630.0, ans=0.0 2024-08-13 22:05:30,322 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1900, loss[loss=0.09192, beats_loss=0.01064, ecapa_loss=0.000135, whisper_loss=0.07993, over 20492.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001553, whisper_loss=0.09015, over 3844231.79 frames. ], batch size: 81, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:05:37,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2337730.0, ans=0.09899494936611666 2024-08-13 22:05:46,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=12.0 2024-08-13 22:06:07,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2337930.0, ans=0.2 2024-08-13 22:06:11,951 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 22:06:18,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2338030.0, ans=0.125 2024-08-13 22:06:31,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2338030.0, ans=0.125 2024-08-13 22:06:34,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2338030.0, ans=0.125 2024-08-13 22:06:45,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2338130.0, ans=0.125 2024-08-13 22:06:52,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 1950, loss[loss=0.09929, beats_loss=0.01053, ecapa_loss=0.0001217, whisper_loss=0.08754, over 14658.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.000155, whisper_loss=0.08972, over 3774886.02 frames. ], batch size: 53, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:06:54,362 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:06:54,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-13 22:07:08,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2338330.0, ans=0.1 2024-08-13 22:07:15,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2338330.0, ans=0.1 2024-08-13 22:07:23,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2338430.0, ans=0.0 2024-08-13 22:07:30,598 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.359e+01 2.594e+01 2.893e+01 6.920e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-13 22:07:37,326 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 22:07:43,086 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 22:08:00,938 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 22:08:13,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2000, loss[loss=0.131, beats_loss=0.008269, ecapa_loss=0.0001507, whisper_loss=0.1212, over 21583.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.000156, whisper_loss=0.08954, over 3768855.66 frames. ], batch size: 79, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:08:22,826 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 34 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 22:08:25,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2338730.0, ans=0.125 2024-08-13 22:08:42,543 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 22:09:07,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2339030.0, ans=0.05 2024-08-13 22:09:09,036 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 22:09:18,294 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 22:09:30,996 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 22:09:35,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2050, loss[loss=0.1093, beats_loss=0.01058, ecapa_loss=0.0001167, whisper_loss=0.09752, over 24152.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.0001556, whisper_loss=0.08949, over 3811075.38 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:10:12,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2339430.0, ans=0.125 2024-08-13 22:10:13,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.646e+01 3.086e+01 1.043e+02, threshold=5.292e+01, percent-clipped=1.0 2024-08-13 22:10:26,381 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 22:10:34,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2339530.0, ans=0.125 2024-08-13 22:10:36,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2339530.0, ans=0.125 2024-08-13 22:10:36,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2339530.0, ans=0.125 2024-08-13 22:10:38,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2339630.0, ans=0.125 2024-08-13 22:10:41,680 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 22:10:44,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2339630.0, ans=0.02 2024-08-13 22:10:57,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2100, loss[loss=0.117, beats_loss=0.009528, ecapa_loss=0.0001871, whisper_loss=0.1056, over 20107.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001555, whisper_loss=0.08967, over 3817992.47 frames. ], batch size: 78, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:11:02,329 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 22:11:16,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2339830.0, ans=0.1 2024-08-13 22:11:45,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2340030.0, ans=0.125 2024-08-13 22:11:45,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2024-08-13 22:11:50,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2340030.0, ans=0.2 2024-08-13 22:12:00,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2340130.0, ans=0.125 2024-08-13 22:12:01,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-08-13 22:12:14,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2340230.0, ans=15.0 2024-08-13 22:12:14,758 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2150, loss[loss=0.1086, beats_loss=0.01189, ecapa_loss=0.000138, whisper_loss=0.0953, over 23297.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001544, whisper_loss=0.09049, over 3818804.24 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:12:23,472 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 22:12:36,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2340330.0, ans=0.1 2024-08-13 22:12:54,187 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.324e+01 2.581e+01 2.963e+01 1.302e+02, threshold=5.163e+01, percent-clipped=1.0 2024-08-13 22:12:54,336 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 22:12:54,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2340430.0, ans=0.2 2024-08-13 22:13:02,629 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 22:13:06,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2340530.0, ans=0.0 2024-08-13 22:13:06,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-08-13 22:13:11,871 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 22:13:24,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2340630.0, ans=0.125 2024-08-13 22:13:36,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2200, loss[loss=0.1003, beats_loss=0.01042, ecapa_loss=0.0001675, whisper_loss=0.08823, over 19645.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001558, whisper_loss=0.09113, over 3841656.55 frames. ], batch size: 77, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:13:39,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2340730.0, ans=0.2 2024-08-13 22:13:40,171 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 22:13:44,772 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-13 22:13:50,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-08-13 22:13:56,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=12.0 2024-08-13 22:14:32,347 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 22:14:37,415 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 22:14:39,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2341130.0, ans=0.0 2024-08-13 22:14:57,135 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2250, loss[loss=0.08418, beats_loss=0.01266, ecapa_loss=0.0001449, whisper_loss=0.07008, over 13812.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001554, whisper_loss=0.091, over 3847075.05 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:14:57,329 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 22:15:02,040 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 22:15:22,579 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 22:15:35,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.396e+01 2.660e+01 2.938e+01 1.173e+02, threshold=5.320e+01, percent-clipped=2.0 2024-08-13 22:15:45,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2341530.0, ans=0.015 2024-08-13 22:15:47,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2341530.0, ans=0.125 2024-08-13 22:15:49,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2341530.0, ans=0.0 2024-08-13 22:15:53,225 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 22:15:55,079 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 22:15:59,794 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 22:16:08,194 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 22:16:09,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2341630.0, ans=0.125 2024-08-13 22:16:13,315 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 22:16:18,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2300, loss[loss=0.1107, beats_loss=0.01151, ecapa_loss=0.0001471, whisper_loss=0.09775, over 21443.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.0001554, whisper_loss=0.09169, over 3849023.57 frames. ], batch size: 84, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:16:24,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2341730.0, ans=0.1 2024-08-13 22:16:38,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2341830.0, ans=0.0 2024-08-13 22:16:39,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-13 22:17:17,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.81 vs. limit=15.0 2024-08-13 22:17:19,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2342030.0, ans=0.1 2024-08-13 22:17:27,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-13 22:17:31,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2342130.0, ans=0.09899494936611666 2024-08-13 22:17:35,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2342130.0, ans=0.125 2024-08-13 22:17:35,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2342130.0, ans=0.1 2024-08-13 22:17:39,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2350, loss[loss=0.09931, beats_loss=0.01023, ecapa_loss=0.0001507, whisper_loss=0.08757, over 19540.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0107, ecapa_loss=0.0001583, whisper_loss=0.09256, over 3876345.29 frames. ], batch size: 74, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:17:49,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2342230.0, ans=0.125 2024-08-13 22:17:56,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2024-08-13 22:17:58,877 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 22:18:14,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2342430.0, ans=0.125 2024-08-13 22:18:19,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.389e+01 2.636e+01 2.881e+01 1.786e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 22:18:31,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2342530.0, ans=0.0 2024-08-13 22:18:35,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2342530.0, ans=0.035 2024-08-13 22:18:47,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2342630.0, ans=0.0 2024-08-13 22:18:48,291 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-13 22:18:50,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2342630.0, ans=0.125 2024-08-13 22:18:56,614 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 22:19:01,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2400, loss[loss=0.1261, beats_loss=0.00964, ecapa_loss=0.0001441, whisper_loss=0.115, over 20519.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01067, ecapa_loss=0.0001588, whisper_loss=0.09211, over 3858304.75 frames. ], batch size: 78, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:19:07,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2342730.0, ans=0.125 2024-08-13 22:19:16,165 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 22:19:24,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2342830.0, ans=0.2 2024-08-13 22:19:34,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2342930.0, ans=0.0 2024-08-13 22:19:41,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2342930.0, ans=0.125 2024-08-13 22:20:08,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2343130.0, ans=0.0 2024-08-13 22:20:15,233 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 22:20:16,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2343130.0, ans=0.0 2024-08-13 22:20:17,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2343130.0, ans=0.04949747468305833 2024-08-13 22:20:24,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2450, loss[loss=0.1268, beats_loss=0.006709, ecapa_loss=0.0002086, whisper_loss=0.118, over 21004.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01066, ecapa_loss=0.0001601, whisper_loss=0.09174, over 3844417.14 frames. ], batch size: 86, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:20:42,134 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 22:20:50,436 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-13 22:21:01,159 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 9 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 22:21:05,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.291e+01 2.587e+01 2.997e+01 1.554e+02, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:21:14,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2343530.0, ans=0.07 2024-08-13 22:21:39,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2343630.0, ans=0.125 2024-08-13 22:21:47,785 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2500, loss[loss=0.1162, beats_loss=0.009375, ecapa_loss=0.0001699, whisper_loss=0.1051, over 22850.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001616, whisper_loss=0.09172, over 3847703.60 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:21:53,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=12.0 2024-08-13 22:21:58,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2343730.0, ans=0.1 2024-08-13 22:22:08,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2343830.0, ans=0.125 2024-08-13 22:22:08,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2343830.0, ans=0.125 2024-08-13 22:22:10,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2343830.0, ans=0.125 2024-08-13 22:22:10,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2343830.0, ans=0.09899494936611666 2024-08-13 22:22:16,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2343830.0, ans=0.125 2024-08-13 22:22:21,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-13 22:22:25,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2343930.0, ans=0.0 2024-08-13 22:22:26,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2343930.0, ans=0.125 2024-08-13 22:22:31,221 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 22:22:41,043 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 22:22:46,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2024-08-13 22:22:47,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2344030.0, ans=0.0 2024-08-13 22:22:52,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2024-08-13 22:23:03,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-13 22:23:05,580 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 22:23:12,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2550, loss[loss=0.09514, beats_loss=0.01209, ecapa_loss=0.0001544, whisper_loss=0.08151, over 17184.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001604, whisper_loss=0.09118, over 3875704.65 frames. ], batch size: 69, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:23:49,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2344430.0, ans=0.05 2024-08-13 22:23:53,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.329e+01 2.677e+01 3.229e+01 5.510e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 22:23:55,598 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 22:24:17,827 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 22:24:28,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2344630.0, ans=0.125 2024-08-13 22:24:35,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2600, loss[loss=0.08446, beats_loss=0.01371, ecapa_loss=0.0001419, whisper_loss=0.06934, over 21347.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.000159, whisper_loss=0.09066, over 3857404.94 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:24:44,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2344730.0, ans=0.025 2024-08-13 22:24:51,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2024-08-13 22:24:54,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2344830.0, ans=0.1 2024-08-13 22:25:11,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2344930.0, ans=0.0 2024-08-13 22:25:20,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2344930.0, ans=0.1 2024-08-13 22:25:32,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2345030.0, ans=0.125 2024-08-13 22:25:36,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2345030.0, ans=0.07 2024-08-13 22:25:40,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2345130.0, ans=0.2 2024-08-13 22:25:47,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2345130.0, ans=0.1 2024-08-13 22:25:54,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2650, loss[loss=0.1256, beats_loss=0.009579, ecapa_loss=0.0001725, whisper_loss=0.1143, over 23118.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001594, whisper_loss=0.09068, over 3859167.42 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:26:02,634 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 22:26:05,043 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 22:26:08,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2345330.0, ans=0.125 2024-08-13 22:26:20,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2345330.0, ans=0.125 2024-08-13 22:26:22,761 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 22:26:31,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.316e+01 2.514e+01 2.879e+01 4.241e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-13 22:26:40,209 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-13 22:26:51,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2345530.0, ans=0.125 2024-08-13 22:26:52,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2345530.0, ans=0.0 2024-08-13 22:26:57,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2345630.0, ans=0.125 2024-08-13 22:27:10,398 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 22:27:13,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2700, loss[loss=0.09272, beats_loss=0.0111, ecapa_loss=9.507e-05, whisper_loss=0.08067, over 15610.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001589, whisper_loss=0.09061, over 3871529.74 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:27:20,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2345730.0, ans=0.1 2024-08-13 22:27:23,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2345730.0, ans=0.2 2024-08-13 22:27:26,241 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 22:27:39,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2345830.0, ans=0.125 2024-08-13 22:27:56,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-13 22:27:57,376 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 22:28:02,061 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:28:03,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2346030.0, ans=0.0 2024-08-13 22:28:11,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2346030.0, ans=0.125 2024-08-13 22:28:12,388 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 22:28:28,110 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 12 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 22:28:32,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2750, loss[loss=0.09277, beats_loss=0.01172, ecapa_loss=0.0001482, whisper_loss=0.07956, over 22529.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001589, whisper_loss=0.09059, over 3881596.20 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:28:37,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2346230.0, ans=0.0 2024-08-13 22:28:40,956 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 22:29:04,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2346430.0, ans=0.0 2024-08-13 22:29:08,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2346430.0, ans=0.2 2024-08-13 22:29:11,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.410e+01 2.665e+01 3.029e+01 5.908e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 22:29:11,940 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 22:29:28,426 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 22:29:34,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2346630.0, ans=10.0 2024-08-13 22:29:41,644 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 22:29:43,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2346630.0, ans=0.0 2024-08-13 22:29:46,476 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 22:29:50,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2800, loss[loss=0.1094, beats_loss=0.0128, ecapa_loss=0.0001428, whisper_loss=0.09517, over 22370.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.0001584, whisper_loss=0.09056, over 3865743.89 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:29:53,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2346730.0, ans=0.0 2024-08-13 22:29:54,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-08-13 22:29:59,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-13 22:30:11,283 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 22:30:11,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2346830.0, ans=0.1 2024-08-13 22:30:23,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2346930.0, ans=0.0 2024-08-13 22:30:31,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2346930.0, ans=0.0 2024-08-13 22:30:34,027 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-13 22:30:43,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2347030.0, ans=0.125 2024-08-13 22:31:10,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2347130.0, ans=0.1 2024-08-13 22:31:15,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2850, loss[loss=0.09191, beats_loss=0.01152, ecapa_loss=0.0001719, whisper_loss=0.07867, over 20943.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01093, ecapa_loss=0.0001573, whisper_loss=0.0904, over 3882939.27 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:31:34,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2024-08-13 22:31:39,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2347330.0, ans=0.0 2024-08-13 22:31:44,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-13 22:31:52,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.368e+01 2.681e+01 3.083e+01 7.841e+01, threshold=5.363e+01, percent-clipped=3.0 2024-08-13 22:31:54,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2347430.0, ans=0.125 2024-08-13 22:31:59,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2347430.0, ans=0.125 2024-08-13 22:32:02,839 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 22:32:19,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2347530.0, ans=0.125 2024-08-13 22:32:26,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=15.0 2024-08-13 22:32:43,695 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2900, loss[loss=0.1158, beats_loss=0.008934, ecapa_loss=0.0001894, whisper_loss=0.105, over 18750.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001593, whisper_loss=0.09089, over 3886851.85 frames. ], batch size: 76, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:33:14,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2347830.0, ans=0.125 2024-08-13 22:34:16,139 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 22:34:31,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 2950, loss[loss=0.1252, beats_loss=0.009315, ecapa_loss=0.0001805, whisper_loss=0.114, over 22959.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.000161, whisper_loss=0.09137, over 3904189.30 frames. ], batch size: 92, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:34:42,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2348230.0, ans=0.125 2024-08-13 22:34:46,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.134e+01 2024-08-13 22:34:54,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2348330.0, ans=0.125 2024-08-13 22:34:58,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2348330.0, ans=0.125 2024-08-13 22:35:29,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.427e+01 2.649e+01 3.118e+01 1.077e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-13 22:35:30,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2348430.0, ans=0.1 2024-08-13 22:35:33,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.08 vs. limit=22.5 2024-08-13 22:35:39,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2348430.0, ans=0.1 2024-08-13 22:36:03,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2348530.0, ans=0.125 2024-08-13 22:36:37,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3000, loss[loss=0.1033, beats_loss=0.008169, ecapa_loss=0.0001799, whisper_loss=0.09336, over 17778.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001606, whisper_loss=0.09071, over 3883661.26 frames. ], batch size: 71, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:36:37,567 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 22:37:40,749 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005533, whisper_loss=0.2471, over 922467.00 frames. 2024-08-13 22:38:04,801 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-13 22:41:12,777 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 22:41:12,782 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 22:41:22,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2348730.0, ans=0.125 2024-08-13 22:41:39,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=15.0 2024-08-13 22:42:23,176 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 22:42:34,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2349130.0, ans=0.125 2024-08-13 22:42:41,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3050, loss[loss=0.1085, beats_loss=0.009172, ecapa_loss=0.0001349, whisper_loss=0.09794, over 18152.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.000161, whisper_loss=0.091, over 3895789.22 frames. ], batch size: 71, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:42:44,270 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 22:42:47,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2349230.0, ans=0.0 2024-08-13 22:42:55,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2349230.0, ans=0.05 2024-08-13 22:43:09,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2349330.0, ans=0.0 2024-08-13 22:43:09,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-13 22:43:25,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.432e+01 2.716e+01 3.181e+01 1.148e+02, threshold=5.433e+01, percent-clipped=2.0 2024-08-13 22:43:33,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2349430.0, ans=0.0 2024-08-13 22:43:40,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2349530.0, ans=0.05 2024-08-13 22:43:43,025 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 22:43:45,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2349530.0, ans=0.125 2024-08-13 22:43:46,406 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 22:44:11,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3100, loss[loss=0.1168, beats_loss=0.01089, ecapa_loss=0.0001484, whisper_loss=0.1044, over 22108.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001607, whisper_loss=0.09179, over 3890137.16 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:44:12,071 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 22:44:19,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2349730.0, ans=10.0 2024-08-13 22:44:37,440 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 22:44:37,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=12.0 2024-08-13 22:44:56,940 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 22:45:06,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2350030.0, ans=0.125 2024-08-13 22:45:09,448 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 22:45:14,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-13 22:45:19,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2350130.0, ans=0.125 2024-08-13 22:45:24,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2350130.0, ans=0.125 2024-08-13 22:45:26,145 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 22:45:37,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3150, loss[loss=0.1057, beats_loss=0.01082, ecapa_loss=0.0001874, whisper_loss=0.09299, over 21867.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001608, whisper_loss=0.09183, over 3896207.88 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:45:38,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2350230.0, ans=0.125 2024-08-13 22:45:56,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2350330.0, ans=0.125 2024-08-13 22:45:59,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2350330.0, ans=0.125 2024-08-13 22:46:17,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2350430.0, ans=0.2 2024-08-13 22:46:20,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.358e+01 2.601e+01 2.838e+01 4.154e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 22:46:25,102 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 22:46:40,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2350530.0, ans=0.125 2024-08-13 22:46:46,779 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 22:46:55,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2350630.0, ans=0.1 2024-08-13 22:46:56,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2350630.0, ans=0.2 2024-08-13 22:46:58,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2350630.0, ans=0.125 2024-08-13 22:47:00,806 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 22:47:04,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-13 22:47:07,217 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3200, loss[loss=0.102, beats_loss=0.009658, ecapa_loss=0.0001681, whisper_loss=0.09065, over 21831.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001613, whisper_loss=0.0921, over 3865642.71 frames. ], batch size: 87, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:48:28,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=8.0 2024-08-13 22:48:31,971 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 22:48:32,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-13 22:48:37,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3250, loss[loss=0.1009, beats_loss=0.01027, ecapa_loss=0.0001634, whisper_loss=0.08901, over 22820.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01075, ecapa_loss=0.0001611, whisper_loss=0.09238, over 3874386.61 frames. ], batch size: 93, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:48:45,985 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 22:49:19,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.385e+01 2.597e+01 2.999e+01 7.217e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 22:49:45,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2351530.0, ans=0.1 2024-08-13 22:49:51,195 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-13 22:49:58,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-08-13 22:50:00,624 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.640e-03 2024-08-13 22:50:02,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2351630.0, ans=0.0 2024-08-13 22:50:05,127 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3300, loss[loss=0.111, beats_loss=0.01064, ecapa_loss=0.0001399, whisper_loss=0.09899, over 22298.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001612, whisper_loss=0.09226, over 3898349.90 frames. ], batch size: 87, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:50:12,765 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-13 22:50:14,051 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 22:50:15,997 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 22:50:22,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-13 22:50:29,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2351830.0, ans=0.1 2024-08-13 22:50:31,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-08-13 22:50:49,077 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 22:50:52,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2351930.0, ans=0.02 2024-08-13 22:50:55,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2351930.0, ans=0.125 2024-08-13 22:50:58,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2352030.0, ans=0.125 2024-08-13 22:51:04,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2352030.0, ans=0.1 2024-08-13 22:51:16,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2352130.0, ans=0.07 2024-08-13 22:51:20,148 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-13 22:51:23,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2352130.0, ans=0.2 2024-08-13 22:51:27,240 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-13 22:51:29,061 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 22:51:29,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2352230.0, ans=0.125 2024-08-13 22:51:30,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3350, loss[loss=0.07919, beats_loss=0.01109, ecapa_loss=0.0001649, whisper_loss=0.06645, over 19219.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01069, ecapa_loss=0.000162, whisper_loss=0.09236, over 3878709.32 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:51:49,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2352330.0, ans=0.1 2024-08-13 22:51:52,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2352330.0, ans=10.0 2024-08-13 22:51:53,498 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 22:51:55,118 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 22:52:11,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.332e+01 2.587e+01 3.048e+01 7.749e+01, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:52:12,842 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 22:52:46,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2352630.0, ans=0.125 2024-08-13 22:52:53,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2352630.0, ans=0.125 2024-08-13 22:52:53,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2352630.0, ans=0.125 2024-08-13 22:52:56,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3400, loss[loss=0.09221, beats_loss=0.01006, ecapa_loss=0.0001983, whisper_loss=0.08016, over 14496.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01073, ecapa_loss=0.000161, whisper_loss=0.09234, over 3879939.06 frames. ], batch size: 60, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:53:22,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2352830.0, ans=0.95 2024-08-13 22:53:27,049 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 22:54:04,675 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 22:54:12,129 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 22:54:19,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2353130.0, ans=0.0 2024-08-13 22:54:26,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3450, loss[loss=0.09293, beats_loss=0.01155, ecapa_loss=0.0001396, whisper_loss=0.07999, over 15276.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001605, whisper_loss=0.0917, over 3892617.82 frames. ], batch size: 58, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:54:35,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2353230.0, ans=0.125 2024-08-13 22:54:44,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2353330.0, ans=0.125 2024-08-13 22:54:49,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2353330.0, ans=0.125 2024-08-13 22:54:49,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2353330.0, ans=0.1 2024-08-13 22:54:53,556 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 22:54:57,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2024-08-13 22:55:09,104 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.393e+01 2.606e+01 2.901e+01 5.659e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-13 22:55:11,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2353430.0, ans=0.09899494936611666 2024-08-13 22:55:17,621 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 22:55:36,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2353630.0, ans=0.125 2024-08-13 22:55:37,388 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 22:55:48,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2353630.0, ans=0.125 2024-08-13 22:55:52,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3500, loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.000171, whisper_loss=0.09064, over 23519.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001602, whisper_loss=0.09172, over 3895401.64 frames. ], batch size: 93, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:55:58,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2353730.0, ans=0.0 2024-08-13 22:56:08,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2353830.0, ans=0.1 2024-08-13 22:56:14,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2024-08-13 22:57:08,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2354130.0, ans=0.125 2024-08-13 22:57:15,865 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3550, loss[loss=0.1055, beats_loss=0.009246, ecapa_loss=0.0001312, whisper_loss=0.09495, over 15893.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001598, whisper_loss=0.09161, over 3895531.03 frames. ], batch size: 57, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:57:21,148 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 22:57:26,970 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 22:57:28,601 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 22:57:34,167 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 22:57:54,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.375e+01 2.617e+01 2.958e+01 4.205e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 22:58:26,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2354630.0, ans=0.125 2024-08-13 22:58:36,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3600, loss[loss=0.1113, beats_loss=0.01058, ecapa_loss=0.0001512, whisper_loss=0.09926, over 19012.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001596, whisper_loss=0.09121, over 3904177.62 frames. ], batch size: 75, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:58:52,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2354830.0, ans=0.125 2024-08-13 22:59:00,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2354830.0, ans=0.125 2024-08-13 22:59:07,511 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 22:59:11,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2354930.0, ans=0.125 2024-08-13 22:59:24,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-13 22:59:24,768 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 22:59:26,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-08-13 22:59:37,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-13 22:59:38,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2355030.0, ans=0.125 2024-08-13 22:59:56,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3650, loss[loss=0.105, beats_loss=0.0116, ecapa_loss=0.0001457, whisper_loss=0.09199, over 15400.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001589, whisper_loss=0.09146, over 3888085.30 frames. ], batch size: 60, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:00:00,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2355230.0, ans=0.125 2024-08-13 23:00:01,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2355230.0, ans=0.0 2024-08-13 23:00:01,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2355230.0, ans=0.125 2024-08-13 23:00:05,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.75 vs. limit=10.0 2024-08-13 23:00:18,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2355330.0, ans=0.125 2024-08-13 23:00:26,666 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 23:00:34,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.445e+01 2.700e+01 3.239e+01 5.632e+01, threshold=5.401e+01, percent-clipped=1.0 2024-08-13 23:00:42,610 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 23:00:51,634 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 23:00:57,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2355530.0, ans=0.0 2024-08-13 23:01:01,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2355630.0, ans=0.0 2024-08-13 23:01:06,971 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 23:01:15,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3700, loss[loss=0.07893, beats_loss=0.01097, ecapa_loss=0.0001655, whisper_loss=0.0663, over 14765.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001599, whisper_loss=0.09127, over 3874884.95 frames. ], batch size: 59, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:01:19,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2355730.0, ans=0.1 2024-08-13 23:01:39,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=15.0 2024-08-13 23:01:40,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2355830.0, ans=0.125 2024-08-13 23:02:00,261 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-13 23:02:05,237 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 23:02:14,372 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 23:02:19,314 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:02:24,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2356130.0, ans=0.0 2024-08-13 23:02:34,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3750, loss[loss=0.1267, beats_loss=0.009657, ecapa_loss=0.0001528, whisper_loss=0.1155, over 18122.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01092, ecapa_loss=0.0001593, whisper_loss=0.09105, over 3879356.35 frames. ], batch size: 68, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:03:03,082 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 23:03:05,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-08-13 23:03:08,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2356430.0, ans=0.125 2024-08-13 23:03:10,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.349e+01 2.622e+01 2.917e+01 8.940e+01, threshold=5.244e+01, percent-clipped=1.0 2024-08-13 23:03:16,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2356430.0, ans=0.125 2024-08-13 23:03:20,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2356530.0, ans=0.0 2024-08-13 23:03:25,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-08-13 23:03:33,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-08-13 23:03:35,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2356630.0, ans=0.125 2024-08-13 23:03:48,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2356730.0, ans=0.125 2024-08-13 23:03:49,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3800, loss[loss=0.09832, beats_loss=0.0106, ecapa_loss=0.0001852, whisper_loss=0.08587, over 22820.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001601, whisper_loss=0.09089, over 3872123.80 frames. ], batch size: 93, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:03:49,264 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 23:03:53,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-13 23:04:03,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2356830.0, ans=0.09899494936611666 2024-08-13 23:04:23,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2356930.0, ans=0.2 2024-08-13 23:04:25,184 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 23:04:25,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2356930.0, ans=0.125 2024-08-13 23:04:42,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2357030.0, ans=0.1 2024-08-13 23:05:00,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2357130.0, ans=0.0 2024-08-13 23:05:01,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-08-13 23:05:07,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3850, loss[loss=0.09352, beats_loss=0.01141, ecapa_loss=0.0001754, whisper_loss=0.08036, over 18607.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001594, whisper_loss=0.09106, over 3867580.56 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:05:42,514 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-13 23:05:44,057 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.323e+01 2.537e+01 2.804e+01 4.147e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-13 23:06:03,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-08-13 23:06:23,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3900, loss[loss=0.1204, beats_loss=0.008959, ecapa_loss=0.0001444, whisper_loss=0.11, over 20168.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.00016, whisper_loss=0.0908, over 3860510.30 frames. ], batch size: 74, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:06:49,492 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 23:06:54,312 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 23:06:59,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2357930.0, ans=0.05 2024-08-13 23:07:09,604 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 23:07:11,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2358030.0, ans=0.125 2024-08-13 23:07:14,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2024-08-13 23:07:19,253 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 23:07:22,433 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 23:07:29,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2358130.0, ans=0.2 2024-08-13 23:07:41,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 3950, loss[loss=0.09494, beats_loss=0.009251, ecapa_loss=0.0001792, whisper_loss=0.0839, over 16026.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001619, whisper_loss=0.09174, over 3874728.09 frames. ], batch size: 64, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:07:41,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2358230.0, ans=0.0 2024-08-13 23:08:07,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2358330.0, ans=0.2 2024-08-13 23:08:14,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2358430.0, ans=0.125 2024-08-13 23:08:20,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.511e+01 2.750e+01 3.070e+01 4.670e+01, threshold=5.499e+01, percent-clipped=0.0 2024-08-13 23:08:57,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4000, loss[loss=0.09524, beats_loss=0.009503, ecapa_loss=0.0001479, whisper_loss=0.08426, over 18844.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001612, whisper_loss=0.09144, over 3858993.32 frames. ], batch size: 72, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:09:03,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2358730.0, ans=0.1 2024-08-13 23:09:17,891 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 24 from LS+wenet, 8 from Vox, 23 fro AS 2024-08-13 23:09:21,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2358830.0, ans=0.0 2024-08-13 23:09:29,376 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 23:09:41,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2358930.0, ans=0.04949747468305833 2024-08-13 23:09:46,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2359030.0, ans=0.125 2024-08-13 23:09:49,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-08-13 23:09:51,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2359030.0, ans=0.1 2024-08-13 23:09:53,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2024-08-13 23:10:04,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-13 23:10:14,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-13 23:10:15,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4050, loss[loss=0.1102, beats_loss=0.0123, ecapa_loss=0.0001346, whisper_loss=0.09657, over 22092.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001615, whisper_loss=0.09092, over 3856110.89 frames. ], batch size: 89, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:10:22,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2359230.0, ans=0.025 2024-08-13 23:10:23,797 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 23:10:26,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-13 23:10:32,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2359330.0, ans=0.1 2024-08-13 23:10:44,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2359430.0, ans=0.125 2024-08-13 23:10:48,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2359430.0, ans=0.1 2024-08-13 23:10:51,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.408e+01 2.659e+01 2.975e+01 6.287e+01, threshold=5.318e+01, percent-clipped=1.0 2024-08-13 23:10:56,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-13 23:10:58,520 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 23:11:29,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4100, loss[loss=0.0947, beats_loss=0.01016, ecapa_loss=0.000138, whisper_loss=0.08316, over 23228.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001617, whisper_loss=0.09072, over 3858849.10 frames. ], batch size: 91, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:11:34,831 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 23:11:43,246 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 23:11:51,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2359830.0, ans=0.0 2024-08-13 23:11:54,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2359830.0, ans=0.125 2024-08-13 23:12:08,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-13 23:12:18,360 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 23:12:30,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-08-13 23:12:43,898 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 23:12:48,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4150, loss[loss=0.1215, beats_loss=0.01001, ecapa_loss=0.0001774, whisper_loss=0.1097, over 23147.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001629, whisper_loss=0.09099, over 3871432.44 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:13:25,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.420e+01 2.616e+01 2.987e+01 7.044e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 23:13:26,242 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 23:13:49,204 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 23:13:51,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2360630.0, ans=0.0 2024-08-13 23:14:02,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4200, loss[loss=0.1001, beats_loss=0.01189, ecapa_loss=0.000123, whisper_loss=0.08698, over 22493.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.000163, whisper_loss=0.09145, over 3913183.18 frames. ], batch size: 88, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:14:14,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2360730.0, ans=0.125 2024-08-13 23:14:32,745 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 23:14:34,146 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 23:14:45,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.98 vs. limit=10.0 2024-08-13 23:14:50,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.93 vs. limit=10.0 2024-08-13 23:14:53,085 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.441e+00 2024-08-13 23:15:01,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2361130.0, ans=0.0 2024-08-13 23:15:03,517 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-13 23:15:12,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4250, loss[loss=0.1156, beats_loss=0.009861, ecapa_loss=0.0001539, whisper_loss=0.1042, over 21691.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001625, whisper_loss=0.09094, over 3927348.15 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:15:18,359 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-13 23:15:19,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-13 23:15:25,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2361330.0, ans=0.035 2024-08-13 23:15:32,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2361330.0, ans=0.1 2024-08-13 23:15:37,367 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 23:15:39,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2024-08-13 23:15:44,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2361430.0, ans=0.0 2024-08-13 23:15:44,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.294e+01 2.587e+01 2.870e+01 6.296e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 23:15:46,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-08-13 23:15:51,555 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 23:16:10,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-13 23:16:17,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4300, loss[loss=0.1044, beats_loss=0.01137, ecapa_loss=0.0001561, whisper_loss=0.09146, over 14695.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001625, whisper_loss=0.09093, over 3899535.65 frames. ], batch size: 57, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:17:07,794 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 23:17:43,558 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 23:17:45,571 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 23:17:49,699 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 23:17:51,552 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 23:18:04,511 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 23:18:13,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4350, loss[loss=0.0827, beats_loss=0.01117, ecapa_loss=0.0001706, whisper_loss=0.06982, over 17558.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001622, whisper_loss=0.09086, over 3891412.25 frames. ], batch size: 71, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:18:13,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2362230.0, ans=0.125 2024-08-13 23:18:21,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-13 23:18:52,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.337e+01 2.576e+01 3.012e+01 4.056e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-13 23:18:56,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2362430.0, ans=0.125 2024-08-13 23:19:04,052 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 32 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 23:19:33,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4400, loss[loss=0.09445, beats_loss=0.01457, ecapa_loss=0.0001407, whisper_loss=0.07847, over 17851.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001635, whisper_loss=0.09142, over 3871577.31 frames. ], batch size: 74, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:19:36,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2362730.0, ans=0.0 2024-08-13 23:19:39,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.70 vs. limit=22.5 2024-08-13 23:19:57,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2362830.0, ans=0.125 2024-08-13 23:20:38,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2363130.0, ans=0.0 2024-08-13 23:20:48,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4450, loss[loss=0.1209, beats_loss=0.009975, ecapa_loss=0.0001565, whisper_loss=0.1094, over 23579.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001618, whisper_loss=0.09102, over 3889226.56 frames. ], batch size: 88, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:20:50,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2363230.0, ans=0.5 2024-08-13 23:20:52,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2363230.0, ans=0.1 2024-08-13 23:21:04,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-08-13 23:21:10,807 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 23:21:12,151 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 23:21:22,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2363430.0, ans=0.125 2024-08-13 23:21:28,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.409e+01 2.664e+01 2.942e+01 4.100e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 23:21:29,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-13 23:21:30,245 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 33 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 23:21:30,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.83 vs. limit=15.0 2024-08-13 23:21:31,607 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 23:21:32,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2363430.0, ans=0.125 2024-08-13 23:21:32,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2024-08-13 23:21:35,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=22.5 2024-08-13 23:22:09,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4500, loss[loss=0.09935, beats_loss=0.01278, ecapa_loss=0.0001613, whisper_loss=0.08496, over 16773.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.000161, whisper_loss=0.09069, over 3872291.30 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:22:13,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2363730.0, ans=0.125 2024-08-13 23:22:51,750 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:22:51,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2363930.0, ans=0.125 2024-08-13 23:22:56,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-13 23:23:18,920 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 23:23:22,096 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 23:23:24,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4550, loss[loss=0.1107, beats_loss=0.009511, ecapa_loss=0.0001363, whisper_loss=0.09981, over 19670.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001616, whisper_loss=0.09105, over 3883979.37 frames. ], batch size: 75, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:23:32,034 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 23:23:32,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2364230.0, ans=0.125 2024-08-13 23:23:33,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2364230.0, ans=0.0 2024-08-13 23:23:36,484 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 23:23:39,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2364330.0, ans=0.125 2024-08-13 23:23:40,845 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:23:42,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2364330.0, ans=0.07 2024-08-13 23:23:47,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-13 23:23:49,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2364330.0, ans=0.07 2024-08-13 23:24:00,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.368e+01 2.686e+01 2.952e+01 5.692e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-13 23:24:11,659 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 23:24:13,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2024-08-13 23:24:33,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4600, loss[loss=0.1084, beats_loss=0.009782, ecapa_loss=0.0001958, whisper_loss=0.09671, over 19155.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001618, whisper_loss=0.09128, over 3863820.02 frames. ], batch size: 81, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:24:49,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-13 23:24:59,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2364930.0, ans=22.5 2024-08-13 23:25:06,764 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 23:25:13,966 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-13 23:25:14,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2365030.0, ans=0.125 2024-08-13 23:25:21,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2365030.0, ans=0.0 2024-08-13 23:25:24,702 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 23:25:26,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-13 23:25:37,599 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 23:25:41,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4650, loss[loss=0.1067, beats_loss=0.01028, ecapa_loss=0.0001791, whisper_loss=0.09461, over 18729.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001624, whisper_loss=0.09133, over 3871180.17 frames. ], batch size: 75, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:25:43,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2365230.0, ans=0.125 2024-08-13 23:25:52,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-13 23:26:01,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2365330.0, ans=0.125 2024-08-13 23:26:03,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2365330.0, ans=0.125 2024-08-13 23:26:15,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.449e+01 2.734e+01 2.969e+01 1.115e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-13 23:26:16,857 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 15 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 23:26:32,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-13 23:26:41,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2365630.0, ans=0.0 2024-08-13 23:26:42,708 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 23:26:44,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-13 23:26:45,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2365630.0, ans=10.0 2024-08-13 23:26:47,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4700, loss[loss=0.1293, beats_loss=0.007881, ecapa_loss=0.0001874, whisper_loss=0.1196, over 21239.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001624, whisper_loss=0.09155, over 3880312.73 frames. ], batch size: 87, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:26:50,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2365730.0, ans=0.0 2024-08-13 23:26:55,658 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 23:26:56,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=12.0 2024-08-13 23:27:27,055 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 23:27:44,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2366130.0, ans=0.125 2024-08-13 23:27:50,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2366130.0, ans=15.0 2024-08-13 23:27:52,735 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4750, loss[loss=0.1069, beats_loss=0.01185, ecapa_loss=0.0001766, whisper_loss=0.09329, over 20765.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01075, ecapa_loss=0.0001608, whisper_loss=0.09183, over 3883631.13 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:27:55,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2366230.0, ans=0.1 2024-08-13 23:27:59,056 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 23:28:00,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2366230.0, ans=0.1 2024-08-13 23:28:07,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2366330.0, ans=0.2 2024-08-13 23:28:08,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2366330.0, ans=0.0 2024-08-13 23:28:25,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.418e+01 2.670e+01 2.931e+01 4.166e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-13 23:28:38,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2366530.0, ans=0.125 2024-08-13 23:28:50,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=12.0 2024-08-13 23:28:51,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2366630.0, ans=0.1 2024-08-13 23:28:52,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2366630.0, ans=0.125 2024-08-13 23:28:57,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4800, loss[loss=0.08171, beats_loss=0.01177, ecapa_loss=0.0001371, whisper_loss=0.06857, over 15410.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001615, whisper_loss=0.09157, over 3878431.50 frames. ], batch size: 63, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:29:15,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-13 23:29:17,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2366830.0, ans=0.125 2024-08-13 23:29:20,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-13 23:29:37,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-13 23:29:42,074 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 23:29:53,723 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 23:30:02,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4850, loss[loss=0.09119, beats_loss=0.01061, ecapa_loss=0.0001535, whisper_loss=0.07904, over 22939.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001624, whisper_loss=0.09096, over 3885604.24 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:30:22,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2367330.0, ans=0.95 2024-08-13 23:30:33,726 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 23:30:34,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2367430.0, ans=0.2 2024-08-13 23:30:34,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-13 23:30:35,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.351e+01 2.637e+01 2.912e+01 5.043e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:30:49,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2367530.0, ans=0.1 2024-08-13 23:30:53,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2367630.0, ans=0.125 2024-08-13 23:31:07,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4900, loss[loss=0.1133, beats_loss=0.0105, ecapa_loss=0.0001345, whisper_loss=0.1014, over 16916.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001628, whisper_loss=0.09112, over 3894616.42 frames. ], batch size: 62, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:31:13,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2367730.0, ans=0.125 2024-08-13 23:31:16,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2367730.0, ans=0.0 2024-08-13 23:31:17,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2367730.0, ans=0.1 2024-08-13 23:31:20,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2367830.0, ans=0.0 2024-08-13 23:31:23,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-08-13 23:31:27,565 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 23:31:27,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2367830.0, ans=0.125 2024-08-13 23:31:28,726 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 23:31:32,993 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-13 23:31:36,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2367930.0, ans=0.125 2024-08-13 23:31:45,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2367930.0, ans=22.5 2024-08-13 23:31:52,256 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 23:32:06,935 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 23:32:13,313 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 4950, loss[loss=0.1004, beats_loss=0.01133, ecapa_loss=0.0002014, whisper_loss=0.08702, over 18876.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.000162, whisper_loss=0.09096, over 3868339.61 frames. ], batch size: 83, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:32:21,849 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 11 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 23:32:45,465 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 23:32:46,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.294e+01 2.547e+01 2.845e+01 3.862e+01, threshold=5.095e+01, percent-clipped=0.0 2024-08-13 23:32:58,720 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 23:33:08,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2368630.0, ans=0.125 2024-08-13 23:33:13,775 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 41 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 23:33:19,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5000, loss[loss=0.09543, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.08339, over 15914.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001632, whisper_loss=0.09093, over 3852861.41 frames. ], batch size: 64, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:33:26,622 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-13 23:33:43,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2368930.0, ans=0.125 2024-08-13 23:33:48,488 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 23:33:56,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2369030.0, ans=0.0 2024-08-13 23:34:00,166 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 23:34:00,525 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:34:23,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5050, loss[loss=0.1125, beats_loss=0.007482, ecapa_loss=0.0002085, whisper_loss=0.103, over 17835.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001616, whisper_loss=0.09102, over 3882610.41 frames. ], batch size: 75, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:34:30,815 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 23:34:37,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2369330.0, ans=0.035 2024-08-13 23:34:38,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2369330.0, ans=0.1 2024-08-13 23:34:40,376 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 34 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 23:34:41,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2369330.0, ans=0.125 2024-08-13 23:34:42,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2369330.0, ans=0.125 2024-08-13 23:34:55,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.306e+01 2.530e+01 2.921e+01 5.103e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-13 23:34:56,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2024-08-13 23:35:02,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2369530.0, ans=0.125 2024-08-13 23:35:03,329 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 23:35:13,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2369630.0, ans=0.125 2024-08-13 23:35:19,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.66 vs. limit=10.0 2024-08-13 23:35:27,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5100, loss[loss=0.108, beats_loss=0.01041, ecapa_loss=0.0001852, whisper_loss=0.09571, over 21317.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001618, whisper_loss=0.09137, over 3894871.32 frames. ], batch size: 90, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:35:30,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2369730.0, ans=0.0 2024-08-13 23:35:31,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2369730.0, ans=0.125 2024-08-13 23:35:33,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2369730.0, ans=0.125 2024-08-13 23:35:37,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2369730.0, ans=0.0 2024-08-13 23:35:39,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2369830.0, ans=0.0 2024-08-13 23:35:46,261 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:36:06,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2370030.0, ans=0.0 2024-08-13 23:36:08,990 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 23:36:18,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.19 vs. limit=10.0 2024-08-13 23:36:22,175 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 37 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 23:36:24,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2370130.0, ans=0.0 2024-08-13 23:36:27,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2370130.0, ans=0.07 2024-08-13 23:36:32,309 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5150, loss[loss=0.09715, beats_loss=0.009889, ecapa_loss=0.0002175, whisper_loss=0.08509, over 19587.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001613, whisper_loss=0.09181, over 3942552.67 frames. ], batch size: 83, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:36:57,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2370430.0, ans=0.125 2024-08-13 23:37:05,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.435e+01 2.636e+01 3.072e+01 5.034e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:37:08,954 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 35 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 23:37:10,158 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 23:37:10,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2370530.0, ans=0.2 2024-08-13 23:37:17,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2370530.0, ans=0.125 2024-08-13 23:37:19,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2370530.0, ans=0.2 2024-08-13 23:37:32,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2370630.0, ans=0.125 2024-08-13 23:37:37,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5200, loss[loss=0.07183, beats_loss=0.01361, ecapa_loss=0.0001448, whisper_loss=0.05678, over 16931.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01075, ecapa_loss=0.0001614, whisper_loss=0.09187, over 3902675.63 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:37:54,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-13 23:38:06,672 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 23:38:15,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2371030.0, ans=0.1 2024-08-13 23:38:38,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2371130.0, ans=0.0 2024-08-13 23:38:39,695 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 23:38:40,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5250, loss[loss=0.09903, beats_loss=0.01111, ecapa_loss=0.0002108, whisper_loss=0.08581, over 17285.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01075, ecapa_loss=0.0001615, whisper_loss=0.09137, over 3867602.57 frames. ], batch size: 74, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:38:55,523 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 23:39:07,014 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 23:39:08,372 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 23:39:09,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2371430.0, ans=0.125 2024-08-13 23:39:13,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.304e+01 2.584e+01 2.839e+01 8.080e+01, threshold=5.168e+01, percent-clipped=1.0 2024-08-13 23:39:20,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2371530.0, ans=0.0 2024-08-13 23:39:22,292 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 23:39:28,682 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 23:39:43,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2371630.0, ans=0.125 2024-08-13 23:39:45,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5300, loss[loss=0.1213, beats_loss=0.01035, ecapa_loss=0.0001673, whisper_loss=0.1093, over 23257.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001623, whisper_loss=0.09128, over 3882447.23 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:39:45,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2371730.0, ans=0.1 2024-08-13 23:39:48,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2371730.0, ans=0.125 2024-08-13 23:40:22,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2372030.0, ans=0.1 2024-08-13 23:40:30,039 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 23:40:31,340 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 23:40:32,900 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 23:40:35,407 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 23:40:39,324 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 23:40:43,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2372130.0, ans=0.2 2024-08-13 23:40:45,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=12.0 2024-08-13 23:40:46,607 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 23:40:48,936 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5350, loss[loss=0.1114, beats_loss=0.01133, ecapa_loss=0.0001665, whisper_loss=0.09842, over 23096.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.00016, whisper_loss=0.09052, over 3867117.94 frames. ], batch size: 91, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:40:51,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=2372230.0, ans=12.0 2024-08-13 23:40:58,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2372230.0, ans=0.125 2024-08-13 23:41:02,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2372330.0, ans=0.0 2024-08-13 23:41:02,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-13 23:41:05,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2372330.0, ans=0.125 2024-08-13 23:41:19,050 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 43 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 23:41:21,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.441e+01 2.659e+01 2.902e+01 4.183e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-13 23:41:22,829 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 23:41:29,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2372530.0, ans=0.09899494936611666 2024-08-13 23:41:44,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2372630.0, ans=0.0 2024-08-13 23:41:53,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5400, loss[loss=0.1031, beats_loss=0.01142, ecapa_loss=0.0001414, whisper_loss=0.09022, over 22793.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.000159, whisper_loss=0.09129, over 3900526.69 frames. ], batch size: 93, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:41:53,287 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 23:41:59,818 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 23:42:04,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2372830.0, ans=0.125 2024-08-13 23:42:13,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-13 23:42:26,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2024-08-13 23:42:31,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2373030.0, ans=0.125 2024-08-13 23:42:34,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2373030.0, ans=0.2 2024-08-13 23:42:43,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-13 23:42:45,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2373130.0, ans=0.2 2024-08-13 23:42:57,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5450, loss[loss=0.1091, beats_loss=0.009784, ecapa_loss=0.0001397, whisper_loss=0.09792, over 14596.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001592, whisper_loss=0.09072, over 3870593.35 frames. ], batch size: 55, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:43:14,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2373330.0, ans=0.0 2024-08-13 23:43:29,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.305e+01 2.546e+01 2.870e+01 4.387e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-13 23:43:34,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2373430.0, ans=0.125 2024-08-13 23:43:38,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2024-08-13 23:43:42,376 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 23:43:49,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2373630.0, ans=0.025 2024-08-13 23:43:53,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2373630.0, ans=0.125 2024-08-13 23:44:02,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5500, loss[loss=0.103, beats_loss=0.01149, ecapa_loss=0.0001498, whisper_loss=0.09003, over 15639.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001606, whisper_loss=0.09074, over 3896733.73 frames. ], batch size: 61, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:44:10,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-13 23:44:11,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2373730.0, ans=0.0 2024-08-13 23:44:11,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2373730.0, ans=0.1 2024-08-13 23:44:12,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2373730.0, ans=0.125 2024-08-13 23:44:14,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2373730.0, ans=0.125 2024-08-13 23:44:29,734 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 23:44:49,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2374030.0, ans=0.2 2024-08-13 23:44:54,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-13 23:45:14,986 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5550, loss[loss=0.08445, beats_loss=0.01003, ecapa_loss=0.0001543, whisper_loss=0.07288, over 16243.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001598, whisper_loss=0.09086, over 3907266.52 frames. ], batch size: 64, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:45:44,749 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:45:48,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2374430.0, ans=0.015 2024-08-13 23:45:51,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.310e+01 2.523e+01 2.896e+01 4.190e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-13 23:45:53,724 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.334e+01 2024-08-13 23:46:00,426 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 23:46:15,879 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 23:46:26,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5600, loss[loss=0.1071, beats_loss=0.009498, ecapa_loss=0.0001792, whisper_loss=0.09586, over 21784.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001597, whisper_loss=0.09127, over 3885599.60 frames. ], batch size: 89, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:46:26,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2374730.0, ans=0.125 2024-08-13 23:46:29,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2374730.0, ans=0.2 2024-08-13 23:46:44,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2374830.0, ans=0.2 2024-08-13 23:46:44,180 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.548e-03 2024-08-13 23:46:49,686 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 23:46:59,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2374930.0, ans=0.125 2024-08-13 23:46:59,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2374930.0, ans=0.125 2024-08-13 23:47:02,390 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 23:47:05,845 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 23:47:16,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2375030.0, ans=10.0 2024-08-13 23:47:27,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2375130.0, ans=0.125 2024-08-13 23:47:27,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2375130.0, ans=0.125 2024-08-13 23:47:34,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.76 vs. limit=10.0 2024-08-13 23:47:39,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5650, loss[loss=0.08498, beats_loss=0.01007, ecapa_loss=0.0001782, whisper_loss=0.07313, over 14285.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001597, whisper_loss=0.09048, over 3880140.49 frames. ], batch size: 59, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:47:40,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2375230.0, ans=0.125 2024-08-13 23:47:43,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-08-13 23:47:57,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2375330.0, ans=0.125 2024-08-13 23:47:57,247 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.731e-03 2024-08-13 23:48:03,831 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 23:48:13,899 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.432e+01 2.622e+01 2.958e+01 1.611e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-13 23:48:14,301 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.045e-02 2024-08-13 23:48:24,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-08-13 23:48:27,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2375530.0, ans=0.125 2024-08-13 23:48:32,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2375630.0, ans=0.125 2024-08-13 23:48:46,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5700, loss[loss=0.07703, beats_loss=0.01279, ecapa_loss=0.0001426, whisper_loss=0.06281, over 21724.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001601, whisper_loss=0.09096, over 3936663.54 frames. ], batch size: 88, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:48:49,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2375730.0, ans=0.2 2024-08-13 23:48:55,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2375730.0, ans=0.2 2024-08-13 23:49:08,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2375830.0, ans=0.025 2024-08-13 23:49:08,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2375830.0, ans=0.0 2024-08-13 23:49:09,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2375830.0, ans=0.0 2024-08-13 23:49:33,942 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 23:49:37,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-13 23:49:44,272 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 23:49:45,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2376130.0, ans=0.125 2024-08-13 23:49:47,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2376130.0, ans=0.0 2024-08-13 23:49:52,844 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 23:49:57,083 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5750, loss[loss=0.1088, beats_loss=0.009932, ecapa_loss=0.0001941, whisper_loss=0.09695, over 18189.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001612, whisper_loss=0.09056, over 3924733.09 frames. ], batch size: 76, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:50:02,942 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 23:50:04,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2376230.0, ans=0.125 2024-08-13 23:50:32,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.376e+01 2.677e+01 2.886e+01 5.408e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-13 23:50:38,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2376430.0, ans=0.5 2024-08-13 23:50:47,447 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 23:50:52,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2376530.0, ans=0.1 2024-08-13 23:51:04,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2376630.0, ans=0.0 2024-08-13 23:51:09,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5800, loss[loss=0.1089, beats_loss=0.009272, ecapa_loss=0.0001603, whisper_loss=0.09805, over 21074.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001612, whisper_loss=0.09079, over 3879952.31 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:51:31,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2376830.0, ans=0.0 2024-08-13 23:51:31,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2376830.0, ans=0.125 2024-08-13 23:51:34,926 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 23:51:36,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2024-08-13 23:51:46,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2376930.0, ans=0.1 2024-08-13 23:51:55,007 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 23:52:15,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=15.0 2024-08-13 23:52:16,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2377130.0, ans=0.0 2024-08-13 23:52:18,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5850, loss[loss=0.08759, beats_loss=0.01145, ecapa_loss=0.0001705, whisper_loss=0.07444, over 16228.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001616, whisper_loss=0.09049, over 3884558.09 frames. ], batch size: 65, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:52:21,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2377230.0, ans=0.0 2024-08-13 23:52:29,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-13 23:52:35,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=2377330.0, ans=0.1 2024-08-13 23:52:42,870 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-13 23:52:50,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.427e+01 2.667e+01 3.028e+01 6.435e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-13 23:53:04,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2377530.0, ans=0.0 2024-08-13 23:53:05,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2377530.0, ans=0.125 2024-08-13 23:53:22,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=2377630.0, ans=8.0 2024-08-13 23:53:23,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5900, loss[loss=0.09443, beats_loss=0.009593, ecapa_loss=0.0002008, whisper_loss=0.08283, over 13851.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.000162, whisper_loss=0.09062, over 3869108.57 frames. ], batch size: 56, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:53:23,837 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 23:53:39,615 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 23:53:56,448 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 23:53:59,759 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:54:05,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2378030.0, ans=0.125 2024-08-13 23:54:07,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=15.0 2024-08-13 23:54:10,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2378030.0, ans=0.125 2024-08-13 23:54:17,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2378130.0, ans=0.125 2024-08-13 23:54:19,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2378130.0, ans=0.125 2024-08-13 23:54:24,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2378130.0, ans=0.07 2024-08-13 23:54:28,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 5950, loss[loss=0.117, beats_loss=0.00952, ecapa_loss=0.0001586, whisper_loss=0.1059, over 21984.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001614, whisper_loss=0.09022, over 3873140.72 frames. ], batch size: 88, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:54:33,769 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:54:41,219 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 23:54:47,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2378330.0, ans=0.125 2024-08-13 23:54:50,357 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 23:55:00,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.346e+01 2.593e+01 2.833e+01 5.502e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-13 23:55:00,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2378430.0, ans=0.125 2024-08-13 23:55:32,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6000, loss[loss=0.1075, beats_loss=0.01064, ecapa_loss=0.0001459, whisper_loss=0.09537, over 15934.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.000161, whisper_loss=0.09051, over 3889634.06 frames. ], batch size: 62, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:55:32,585 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-13 23:56:14,134 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005558, whisper_loss=0.2472, over 922467.00 frames. 2024-08-13 23:56:35,169 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on SV_voxceleb1: loss=0.004377, beats_loss=0, ecapa_loss=0.0004377, whisper_loss=0, over 939242.00 frames. 2024-08-13 23:58:33,260 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on AT_audioset: loss=0.02362, beats_loss=0.02362, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 23:58:33,264 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-13 23:58:44,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2378730.0, ans=0.2 2024-08-13 23:58:50,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2378830.0, ans=0.125 2024-08-13 23:58:54,054 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-13 23:59:20,706 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09559547156095505, model_norm_threshold=51.8635368347168 2024-08-13 23:59:20,936 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.554e+04, grad_sumsq=7.554e+04, orig_rms_sq=1.000e+00 2024-08-13 23:59:37,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-13 23:59:38,513 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 23:59:44,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6050, loss[loss=0.1011, beats_loss=0.009245, ecapa_loss=0.0001548, whisper_loss=0.09033, over 18574.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001602, whisper_loss=0.0906, over 3873219.00 frames. ], batch size: 71, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:59:49,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2379230.0, ans=0.0 2024-08-13 23:59:50,332 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 23:59:55,796 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 00:00:20,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-08-14 00:00:20,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.343e+01 2.535e+01 2.756e+01 5.425e+02, threshold=5.070e+01, percent-clipped=3.0 2024-08-14 00:00:24,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2379430.0, ans=0.0 2024-08-14 00:00:35,460 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.280e-03 2024-08-14 00:00:38,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2379530.0, ans=0.125 2024-08-14 00:00:40,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2379530.0, ans=0.125 2024-08-14 00:00:57,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6100, loss[loss=0.1036, beats_loss=0.009841, ecapa_loss=0.0001842, whisper_loss=0.09193, over 19146.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001597, whisper_loss=0.09076, over 3885934.83 frames. ], batch size: 79, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:01:10,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2379830.0, ans=0.125 2024-08-14 00:01:16,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2379830.0, ans=0.2 2024-08-14 00:01:16,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-14 00:01:21,246 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 00:01:22,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2379830.0, ans=0.125 2024-08-14 00:02:04,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6150, loss[loss=0.09375, beats_loss=0.01043, ecapa_loss=0.0001913, whisper_loss=0.08141, over 13855.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001607, whisper_loss=0.09075, over 3880990.09 frames. ], batch size: 57, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:02:06,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2380230.0, ans=0.0 2024-08-14 00:02:11,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2380230.0, ans=0.125 2024-08-14 00:02:36,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.475e+01 2.774e+01 3.233e+01 4.746e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 00:02:37,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-14 00:02:42,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2380530.0, ans=0.0 2024-08-14 00:02:54,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2380530.0, ans=0.125 2024-08-14 00:03:09,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6200, loss[loss=0.08947, beats_loss=0.01169, ecapa_loss=0.0001416, whisper_loss=0.07637, over 18268.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001603, whisper_loss=0.09067, over 3901098.80 frames. ], batch size: 74, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:03:14,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2380730.0, ans=0.2 2024-08-14 00:03:23,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2380830.0, ans=0.125 2024-08-14 00:03:23,886 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 00:03:24,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2380830.0, ans=0.0 2024-08-14 00:03:25,189 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 00:03:27,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2380830.0, ans=0.1 2024-08-14 00:03:58,403 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 00:04:10,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-08-14 00:04:15,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6250, loss[loss=0.09452, beats_loss=0.01135, ecapa_loss=0.0001349, whisper_loss=0.08182, over 15242.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.00016, whisper_loss=0.09082, over 3907210.64 frames. ], batch size: 56, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:04:19,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2381230.0, ans=0.2 2024-08-14 00:04:25,414 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 00:04:33,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2381330.0, ans=0.125 2024-08-14 00:04:34,329 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 00:04:41,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2381430.0, ans=0.125 2024-08-14 00:04:44,871 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 00:04:48,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.400e+01 2.693e+01 3.116e+01 1.076e+02, threshold=5.386e+01, percent-clipped=3.0 2024-08-14 00:04:50,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2381430.0, ans=0.125 2024-08-14 00:04:52,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2381530.0, ans=0.0 2024-08-14 00:05:07,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2381630.0, ans=0.0 2024-08-14 00:05:14,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-14 00:05:19,759 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6300, loss[loss=0.08806, beats_loss=0.01204, ecapa_loss=0.000125, whisper_loss=0.07477, over 18534.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01089, ecapa_loss=0.0001597, whisper_loss=0.09087, over 3892225.67 frames. ], batch size: 73, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:05:20,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2381730.0, ans=0.125 2024-08-14 00:05:36,395 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 00:05:41,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2381830.0, ans=0.1 2024-08-14 00:05:44,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2381930.0, ans=0.125 2024-08-14 00:05:49,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2381930.0, ans=0.1 2024-08-14 00:05:55,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2381930.0, ans=0.125 2024-08-14 00:05:59,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2382030.0, ans=0.1 2024-08-14 00:06:02,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2382030.0, ans=0.125 2024-08-14 00:06:03,405 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-14 00:06:13,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2382130.0, ans=0.1 2024-08-14 00:06:23,695 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6350, loss[loss=0.08093, beats_loss=0.01223, ecapa_loss=0.0001404, whisper_loss=0.0673, over 14048.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01093, ecapa_loss=0.0001591, whisper_loss=0.09024, over 3869909.21 frames. ], batch size: 54, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:06:34,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2382230.0, ans=0.1 2024-08-14 00:06:39,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2382330.0, ans=0.0 2024-08-14 00:06:39,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2382330.0, ans=0.0 2024-08-14 00:06:43,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2382330.0, ans=0.1 2024-08-14 00:06:44,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-08-14 00:06:46,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2024-08-14 00:06:53,993 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 00:06:57,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.344e+01 2.620e+01 2.945e+01 1.011e+02, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 00:07:03,177 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 00:07:11,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=12.0 2024-08-14 00:07:20,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2382630.0, ans=0.0 2024-08-14 00:07:22,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2382630.0, ans=0.125 2024-08-14 00:07:25,155 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 00:07:28,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6400, loss[loss=0.1123, beats_loss=0.009826, ecapa_loss=0.0001368, whisper_loss=0.1011, over 15315.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01087, ecapa_loss=0.0001585, whisper_loss=0.09069, over 3886560.65 frames. ], batch size: 58, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:07:43,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2382830.0, ans=10.0 2024-08-14 00:07:52,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2382830.0, ans=0.125 2024-08-14 00:07:57,966 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 00:08:00,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2382930.0, ans=0.125 2024-08-14 00:08:02,828 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 00:08:18,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2383030.0, ans=0.125 2024-08-14 00:08:28,467 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 00:08:34,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6450, loss[loss=0.1149, beats_loss=0.01152, ecapa_loss=0.0001402, whisper_loss=0.102, over 14461.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001592, whisper_loss=0.09064, over 3868434.14 frames. ], batch size: 54, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:09:03,408 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 00:09:06,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.325e+01 2.600e+01 2.932e+01 4.417e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-14 00:09:15,275 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 00:09:19,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2383530.0, ans=0.2 2024-08-14 00:09:37,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6500, loss[loss=0.09693, beats_loss=0.01135, ecapa_loss=0.0001195, whisper_loss=0.08439, over 18514.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001601, whisper_loss=0.09097, over 3886467.23 frames. ], batch size: 72, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:09:40,965 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.483e-02 2024-08-14 00:09:49,726 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 00:09:53,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2383830.0, ans=0.125 2024-08-14 00:10:06,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2383930.0, ans=0.2 2024-08-14 00:10:13,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2383930.0, ans=0.125 2024-08-14 00:10:20,691 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-14 00:10:22,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-08-14 00:10:30,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2384130.0, ans=0.1 2024-08-14 00:10:31,078 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 00:10:41,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6550, loss[loss=0.1304, beats_loss=0.009826, ecapa_loss=0.0001474, whisper_loss=0.1191, over 23077.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001588, whisper_loss=0.09183, over 3922983.12 frames. ], batch size: 91, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:10:44,436 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 00:10:52,342 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 00:10:57,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2384330.0, ans=0.1 2024-08-14 00:11:15,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.424e+01 2.648e+01 2.996e+01 4.448e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-14 00:11:21,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2384530.0, ans=0.0 2024-08-14 00:11:23,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2384530.0, ans=0.2 2024-08-14 00:11:45,382 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09902217984199524, model_norm_threshold=52.96651840209961 2024-08-14 00:11:45,553 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.30, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.471e+04, grad_sumsq=8.471e+04, orig_rms_sq=1.000e+00 2024-08-14 00:11:45,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6600, loss[loss=0.1277, beats_loss=0.007527, ecapa_loss=0.0002056, whisper_loss=0.1182, over 15778.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001605, whisper_loss=0.09254, over 3941127.89 frames. ], batch size: 61, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:11:48,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2384730.0, ans=0.035 2024-08-14 00:11:51,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2384730.0, ans=0.125 2024-08-14 00:12:00,314 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 00:12:03,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2384830.0, ans=0.125 2024-08-14 00:12:06,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-14 00:12:09,807 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 00:12:13,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2384930.0, ans=0.2 2024-08-14 00:12:14,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2384930.0, ans=0.0 2024-08-14 00:12:18,363 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 00:12:18,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-14 00:12:25,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2385030.0, ans=0.07 2024-08-14 00:12:31,235 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 00:12:49,208 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6650, loss[loss=0.09626, beats_loss=0.01241, ecapa_loss=0.0001212, whisper_loss=0.08264, over 20795.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001591, whisper_loss=0.09256, over 3962199.81 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:13:12,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2385330.0, ans=0.1 2024-08-14 00:13:22,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.456e+01 2.724e+01 3.056e+01 5.349e+02, threshold=5.448e+01, percent-clipped=1.0 2024-08-14 00:13:23,725 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 00:13:28,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2385530.0, ans=0.125 2024-08-14 00:13:33,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2385530.0, ans=0.0 2024-08-14 00:13:40,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2385630.0, ans=0.125 2024-08-14 00:13:43,303 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.109e+01 2024-08-14 00:13:44,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2385630.0, ans=0.125 2024-08-14 00:13:53,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6700, loss[loss=0.1089, beats_loss=0.01018, ecapa_loss=0.0001594, whisper_loss=0.09714, over 22655.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001584, whisper_loss=0.09173, over 3936858.86 frames. ], batch size: 88, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:14:01,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2385730.0, ans=0.125 2024-08-14 00:14:10,883 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 00:14:34,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-14 00:14:52,471 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 00:14:53,633 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 00:14:56,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2386230.0, ans=0.125 2024-08-14 00:14:57,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6750, loss[loss=0.1439, beats_loss=0.006851, ecapa_loss=0.0001614, whisper_loss=0.1354, over 21622.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001586, whisper_loss=0.09167, over 3921538.93 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:15:03,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2386230.0, ans=0.125 2024-08-14 00:15:03,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2386230.0, ans=0.1 2024-08-14 00:15:09,337 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 00:15:09,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2386330.0, ans=0.0 2024-08-14 00:15:17,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2386330.0, ans=0.1 2024-08-14 00:15:31,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.377e+01 2.658e+01 2.891e+01 6.359e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-14 00:15:35,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2386530.0, ans=0.0 2024-08-14 00:15:36,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2386530.0, ans=0.125 2024-08-14 00:15:44,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2386530.0, ans=0.2 2024-08-14 00:15:48,501 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 00:16:02,326 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6800, loss[loss=0.1008, beats_loss=0.01014, ecapa_loss=0.0001668, whisper_loss=0.08894, over 18092.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.0001608, whisper_loss=0.09155, over 3913777.55 frames. ], batch size: 71, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:16:17,655 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 00:16:30,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2386930.0, ans=0.125 2024-08-14 00:16:32,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2386930.0, ans=0.025 2024-08-14 00:16:51,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=22.5 2024-08-14 00:16:59,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2387130.0, ans=0.125 2024-08-14 00:17:01,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2387130.0, ans=0.125 2024-08-14 00:17:06,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6850, loss[loss=0.09402, beats_loss=0.01078, ecapa_loss=0.0001852, whisper_loss=0.08138, over 20744.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001612, whisper_loss=0.09074, over 3864531.32 frames. ], batch size: 85, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:17:27,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2387330.0, ans=0.0 2024-08-14 00:17:33,984 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 11 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 00:17:40,251 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:17:41,089 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.424e+01 2.658e+01 2.894e+01 9.462e+01, threshold=5.316e+01, percent-clipped=2.0 2024-08-14 00:17:47,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-14 00:18:11,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6900, loss[loss=0.08883, beats_loss=0.01054, ecapa_loss=0.0001918, whisper_loss=0.07638, over 20974.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001613, whisper_loss=0.09101, over 3868096.99 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:18:15,707 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 00:18:17,721 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-14 00:18:24,847 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:18:28,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2387830.0, ans=0.125 2024-08-14 00:18:45,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2387930.0, ans=0.125 2024-08-14 00:18:47,127 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 00:19:05,775 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 00:19:20,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 6950, loss[loss=0.1225, beats_loss=0.007701, ecapa_loss=0.0001643, whisper_loss=0.1132, over 19254.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001609, whisper_loss=0.09103, over 3871127.78 frames. ], batch size: 75, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:19:25,353 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 00:20:00,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-14 00:20:00,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.466e+01 2.702e+01 3.028e+01 4.381e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-14 00:20:04,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2388430.0, ans=0.0 2024-08-14 00:20:33,404 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 00:20:38,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7000, loss[loss=0.1001, beats_loss=0.01215, ecapa_loss=0.0001625, whisper_loss=0.08631, over 22554.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01086, ecapa_loss=0.0001601, whisper_loss=0.09077, over 3866827.62 frames. ], batch size: 93, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:20:40,212 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.890e-01 2024-08-14 00:21:19,388 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 00:21:20,782 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 00:21:30,774 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-14 00:21:38,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2389030.0, ans=0.07 2024-08-14 00:21:47,572 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 00:21:53,219 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 00:21:58,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7050, loss[loss=0.0898, beats_loss=0.01014, ecapa_loss=0.0001828, whisper_loss=0.07784, over 13350.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001614, whisper_loss=0.09125, over 3887116.09 frames. ], batch size: 53, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:22:36,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-14 00:22:42,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.266e+01 2.592e+01 2.903e+01 1.485e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-14 00:22:51,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2389530.0, ans=0.0 2024-08-14 00:22:57,111 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 00:23:01,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2389530.0, ans=0.125 2024-08-14 00:23:19,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7100, loss[loss=0.1092, beats_loss=0.01107, ecapa_loss=0.000147, whisper_loss=0.09664, over 18771.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001601, whisper_loss=0.09102, over 3881636.54 frames. ], batch size: 73, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:23:21,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2389730.0, ans=0.0 2024-08-14 00:23:34,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2389830.0, ans=0.0 2024-08-14 00:23:34,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2389830.0, ans=0.0 2024-08-14 00:23:34,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2389830.0, ans=0.2 2024-08-14 00:23:38,354 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 00:23:45,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.73 vs. limit=10.0 2024-08-14 00:23:59,110 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 00:24:03,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2389930.0, ans=0.0 2024-08-14 00:24:16,336 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 00:24:32,748 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 00:24:37,460 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 00:24:39,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7150, loss[loss=0.1016, beats_loss=0.009014, ecapa_loss=0.0001766, whisper_loss=0.0908, over 14929.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01092, ecapa_loss=0.0001593, whisper_loss=0.0904, over 3876450.74 frames. ], batch size: 56, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:24:47,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-14 00:24:52,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-14 00:25:05,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2390330.0, ans=0.2 2024-08-14 00:25:09,123 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 00:25:18,741 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 00:25:20,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.385e+01 2.638e+01 3.035e+01 4.278e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 00:25:23,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2390430.0, ans=0.05 2024-08-14 00:25:35,729 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 00:25:37,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.65 vs. limit=10.0 2024-08-14 00:25:39,328 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 00:25:39,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2390530.0, ans=0.1 2024-08-14 00:25:42,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2390630.0, ans=0.1 2024-08-14 00:25:57,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7200, loss[loss=0.1182, beats_loss=0.007877, ecapa_loss=0.0001636, whisper_loss=0.1087, over 19886.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001598, whisper_loss=0.09039, over 3868905.72 frames. ], batch size: 77, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:26:12,458 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 00:26:19,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2390830.0, ans=0.0 2024-08-14 00:26:20,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2390830.0, ans=0.125 2024-08-14 00:26:42,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2391030.0, ans=0.1 2024-08-14 00:27:02,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-14 00:27:06,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2391130.0, ans=0.0 2024-08-14 00:27:06,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2391130.0, ans=0.0 2024-08-14 00:27:09,261 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 00:27:14,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7250, loss[loss=0.1193, beats_loss=0.009941, ecapa_loss=0.0001879, whisper_loss=0.1075, over 16446.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.00016, whisper_loss=0.09095, over 3888397.38 frames. ], batch size: 68, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:27:15,030 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 00:27:37,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2391330.0, ans=0.125 2024-08-14 00:27:40,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2391330.0, ans=10.0 2024-08-14 00:27:43,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2391330.0, ans=0.05 2024-08-14 00:27:49,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2024-08-14 00:27:54,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2391430.0, ans=0.2 2024-08-14 00:27:55,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.401e+01 2.589e+01 2.911e+01 7.095e+01, threshold=5.179e+01, percent-clipped=1.0 2024-08-14 00:28:09,796 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 00:28:12,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2391530.0, ans=0.0 2024-08-14 00:28:13,790 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 00:28:33,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7300, loss[loss=0.08972, beats_loss=0.01213, ecapa_loss=0.0001338, whisper_loss=0.07626, over 23128.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001598, whisper_loss=0.09188, over 3908364.43 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:28:37,611 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 33 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 00:28:40,488 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 00:28:44,950 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 00:28:45,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2391730.0, ans=0.125 2024-08-14 00:28:55,592 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 00:29:02,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2391830.0, ans=0.0 2024-08-14 00:29:12,529 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 00:29:13,053 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.500e-02 2024-08-14 00:29:35,364 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 00:29:43,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2392130.0, ans=0.0 2024-08-14 00:29:45,845 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 00:29:47,448 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 00:29:50,848 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7350, loss[loss=0.1126, beats_loss=0.008364, ecapa_loss=0.000173, whisper_loss=0.1025, over 22834.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.0001622, whisper_loss=0.0921, over 3889113.23 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:29:51,169 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 9 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 00:29:53,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2392230.0, ans=0.07 2024-08-14 00:29:54,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2392230.0, ans=0.0 2024-08-14 00:29:56,885 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 00:30:02,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2024-08-14 00:30:05,439 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 00:30:08,341 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 00:30:08,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2392330.0, ans=0.125 2024-08-14 00:30:13,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2392330.0, ans=0.125 2024-08-14 00:30:20,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2392430.0, ans=0.125 2024-08-14 00:30:28,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2392430.0, ans=0.125 2024-08-14 00:30:32,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.397e+01 2.587e+01 2.821e+01 4.137e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-14 00:30:59,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2392630.0, ans=0.1 2024-08-14 00:31:06,955 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 00:31:11,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2024-08-14 00:31:12,994 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7400, loss[loss=0.09068, beats_loss=0.01258, ecapa_loss=0.0001264, whisper_loss=0.07684, over 23301.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001611, whisper_loss=0.09164, over 3917807.34 frames. ], batch size: 92, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:31:14,619 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 00:31:30,556 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 00:31:32,514 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 00:31:36,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2392830.0, ans=0.125 2024-08-14 00:31:39,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2392830.0, ans=0.0 2024-08-14 00:31:53,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2392930.0, ans=0.0 2024-08-14 00:31:54,442 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 00:31:56,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2392930.0, ans=0.2 2024-08-14 00:32:09,099 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 00:32:09,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2393030.0, ans=0.0 2024-08-14 00:32:12,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2393030.0, ans=0.125 2024-08-14 00:32:20,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=22.5 2024-08-14 00:32:34,304 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7450, loss[loss=0.0859, beats_loss=0.01135, ecapa_loss=0.0001619, whisper_loss=0.07293, over 17843.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001623, whisper_loss=0.0919, over 3926910.04 frames. ], batch size: 74, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:32:34,496 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 00:32:48,986 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 00:33:05,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2393330.0, ans=0.5 2024-08-14 00:33:12,594 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 00:33:13,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2393430.0, ans=0.2 2024-08-14 00:33:19,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.395e+01 2.642e+01 3.080e+01 4.669e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-14 00:33:26,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2393530.0, ans=0.0 2024-08-14 00:33:35,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2393530.0, ans=0.0 2024-08-14 00:33:40,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2393630.0, ans=0.125 2024-08-14 00:34:24,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7500, loss[loss=0.1179, beats_loss=0.009193, ecapa_loss=0.0001284, whisper_loss=0.1074, over 21989.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.0001617, whisper_loss=0.09083, over 3905785.18 frames. ], batch size: 82, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:34:37,936 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 00:34:38,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-14 00:34:44,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2393830.0, ans=0.125 2024-08-14 00:34:45,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2393830.0, ans=0.125 2024-08-14 00:34:56,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2393930.0, ans=0.07 2024-08-14 00:35:01,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2393930.0, ans=0.2 2024-08-14 00:35:08,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2393930.0, ans=0.0 2024-08-14 00:35:31,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2394130.0, ans=0.125 2024-08-14 00:35:40,580 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 00:35:44,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7550, loss[loss=0.09522, beats_loss=0.01122, ecapa_loss=0.0001711, whisper_loss=0.08229, over 13595.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01084, ecapa_loss=0.0001617, whisper_loss=0.08972, over 3843951.75 frames. ], batch size: 54, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:35:45,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2394230.0, ans=0.04949747468305833 2024-08-14 00:35:47,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2394230.0, ans=0.125 2024-08-14 00:36:08,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2394330.0, ans=0.0 2024-08-14 00:36:09,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2394330.0, ans=0.0 2024-08-14 00:36:26,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.563e+01 2.921e+01 3.982e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-14 00:36:31,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2394430.0, ans=0.125 2024-08-14 00:36:34,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2394530.0, ans=0.125 2024-08-14 00:36:55,257 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 00:36:58,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2394630.0, ans=0.125 2024-08-14 00:37:01,026 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 00:37:02,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2394630.0, ans=0.125 2024-08-14 00:37:05,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7600, loss[loss=0.1247, beats_loss=0.008428, ecapa_loss=0.0001449, whisper_loss=0.1148, over 20537.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001622, whisper_loss=0.09036, over 3863843.48 frames. ], batch size: 76, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:37:17,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2394730.0, ans=0.0 2024-08-14 00:37:35,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2394830.0, ans=0.125 2024-08-14 00:37:37,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-08-14 00:37:51,247 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 00:37:53,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-08-14 00:38:06,394 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 00:38:08,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2395130.0, ans=0.125 2024-08-14 00:38:24,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7650, loss[loss=0.08303, beats_loss=0.01184, ecapa_loss=0.0001033, whisper_loss=0.07016, over 15195.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001629, whisper_loss=0.09071, over 3871550.77 frames. ], batch size: 58, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:38:30,753 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 00:38:40,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2395330.0, ans=0.125 2024-08-14 00:38:40,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2024-08-14 00:38:45,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2395330.0, ans=0.1 2024-08-14 00:38:47,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2395330.0, ans=0.2 2024-08-14 00:39:03,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-08-14 00:39:07,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.336e+01 2.593e+01 2.907e+01 5.798e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-14 00:39:15,333 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-14 00:39:16,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2395530.0, ans=0.0 2024-08-14 00:39:19,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2395530.0, ans=0.025 2024-08-14 00:39:36,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2395630.0, ans=0.125 2024-08-14 00:39:47,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7700, loss[loss=0.118, beats_loss=0.009551, ecapa_loss=0.0001609, whisper_loss=0.1069, over 24467.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001611, whisper_loss=0.09064, over 3877174.65 frames. ], batch size: 94, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:39:56,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-14 00:40:22,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2395930.0, ans=0.125 2024-08-14 00:40:23,694 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 00:40:30,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2395930.0, ans=0.2 2024-08-14 00:40:59,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2396130.0, ans=0.0 2024-08-14 00:41:07,503 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7750, loss[loss=0.09178, beats_loss=0.01259, ecapa_loss=0.0001278, whisper_loss=0.07792, over 20879.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.0001596, whisper_loss=0.0898, over 3861991.74 frames. ], batch size: 82, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:41:11,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-14 00:41:29,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2396330.0, ans=0.125 2024-08-14 00:41:42,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2396430.0, ans=0.2 2024-08-14 00:41:48,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2396430.0, ans=0.125 2024-08-14 00:41:49,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.485e+01 2.781e+01 3.099e+01 5.095e+01, threshold=5.562e+01, percent-clipped=0.0 2024-08-14 00:41:49,234 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 00:41:49,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2396430.0, ans=0.0 2024-08-14 00:41:49,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2396430.0, ans=0.1 2024-08-14 00:42:01,653 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 00:42:01,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2396530.0, ans=0.1 2024-08-14 00:42:03,350 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 00:42:12,525 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 00:42:20,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2396630.0, ans=0.125 2024-08-14 00:42:26,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7800, loss[loss=0.1142, beats_loss=0.01137, ecapa_loss=0.0001474, whisper_loss=0.1013, over 22695.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001591, whisper_loss=0.08998, over 3851610.94 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:42:27,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2396730.0, ans=0.2 2024-08-14 00:42:36,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-08-14 00:42:38,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2396730.0, ans=0.125 2024-08-14 00:42:39,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2396730.0, ans=0.025 2024-08-14 00:42:43,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2396830.0, ans=0.125 2024-08-14 00:42:54,230 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 00:43:00,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=15.0 2024-08-14 00:43:23,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-08-14 00:43:34,090 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 00:43:35,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2397130.0, ans=0.125 2024-08-14 00:43:46,914 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 00:43:49,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7850, loss[loss=0.1268, beats_loss=0.009687, ecapa_loss=0.000155, whisper_loss=0.1156, over 17241.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001598, whisper_loss=0.09067, over 3873649.00 frames. ], batch size: 64, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:43:59,634 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 00:44:19,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2397330.0, ans=0.125 2024-08-14 00:44:30,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.344e+01 2.594e+01 2.942e+01 8.076e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-14 00:44:30,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2397430.0, ans=0.125 2024-08-14 00:44:38,365 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 00:44:46,004 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 00:44:56,214 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 00:45:08,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7900, loss[loss=0.09544, beats_loss=0.01579, ecapa_loss=9.488e-05, whisper_loss=0.0787, over 17715.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001584, whisper_loss=0.09077, over 3859163.44 frames. ], batch size: 71, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:45:10,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2397730.0, ans=0.2 2024-08-14 00:45:19,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2397730.0, ans=0.0 2024-08-14 00:45:27,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2397830.0, ans=0.0 2024-08-14 00:45:28,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2397830.0, ans=0.2 2024-08-14 00:45:29,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-14 00:45:37,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2397830.0, ans=0.125 2024-08-14 00:46:12,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-08-14 00:46:24,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2398130.0, ans=0.015 2024-08-14 00:46:27,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 7950, loss[loss=0.1062, beats_loss=0.009397, ecapa_loss=0.0001573, whisper_loss=0.09523, over 18925.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001589, whisper_loss=0.09086, over 3850847.75 frames. ], batch size: 75, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:46:33,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2398230.0, ans=0.025 2024-08-14 00:46:34,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=2398230.0, ans=22.5 2024-08-14 00:46:36,865 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 00:46:45,642 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 16 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 00:47:00,185 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-14 00:47:02,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2398430.0, ans=0.125 2024-08-14 00:47:06,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.380e+01 2.671e+01 3.071e+01 4.593e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 00:47:09,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2398430.0, ans=22.5 2024-08-14 00:47:14,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2398530.0, ans=0.2 2024-08-14 00:47:19,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2398530.0, ans=0.0 2024-08-14 00:47:20,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-14 00:47:38,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2398630.0, ans=0.125 2024-08-14 00:47:41,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8000, loss[loss=0.1087, beats_loss=0.008996, ecapa_loss=0.0001595, whisper_loss=0.09813, over 22566.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001594, whisper_loss=0.09087, over 3859071.24 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:47:50,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2398730.0, ans=0.125 2024-08-14 00:48:04,362 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 00:48:04,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2024-08-14 00:48:07,691 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 00:48:13,629 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 00:48:23,787 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 00:48:30,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2399030.0, ans=0.1 2024-08-14 00:48:30,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=2399030.0, ans=0.02 2024-08-14 00:48:35,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2024-08-14 00:48:38,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2399030.0, ans=0.2 2024-08-14 00:48:38,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2024-08-14 00:48:44,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-14 00:48:53,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2399130.0, ans=0.0 2024-08-14 00:48:57,347 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8050, loss[loss=0.09796, beats_loss=0.01023, ecapa_loss=0.000161, whisper_loss=0.08612, over 16017.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001584, whisper_loss=0.09066, over 3887336.06 frames. ], batch size: 61, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:49:07,395 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 29 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 00:49:09,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2399230.0, ans=0.125 2024-08-14 00:49:11,319 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 00:49:11,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2399330.0, ans=0.0 2024-08-14 00:49:14,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2399330.0, ans=0.1 2024-08-14 00:49:20,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-08-14 00:49:28,256 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 00:49:33,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.422e+01 2.734e+01 3.214e+01 1.918e+02, threshold=5.469e+01, percent-clipped=2.0 2024-08-14 00:49:36,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.43 vs. limit=15.0 2024-08-14 00:49:41,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2024-08-14 00:50:03,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-14 00:50:10,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8100, loss[loss=0.101, beats_loss=0.0116, ecapa_loss=0.000161, whisper_loss=0.08781, over 23131.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.000158, whisper_loss=0.09053, over 3873029.91 frames. ], batch size: 90, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:50:24,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2399730.0, ans=0.95 2024-08-14 00:50:33,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2399830.0, ans=0.2 2024-08-14 00:50:38,673 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 00:50:40,284 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 00:50:50,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2399930.0, ans=0.125 2024-08-14 00:51:28,488 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 00:51:32,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8150, loss[loss=0.09934, beats_loss=0.01056, ecapa_loss=0.0001358, whisper_loss=0.08741, over 16123.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001589, whisper_loss=0.09128, over 3894722.04 frames. ], batch size: 61, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:51:47,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2400330.0, ans=0.2 2024-08-14 00:51:50,397 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 00:52:06,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2400430.0, ans=0.125 2024-08-14 00:52:13,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-08-14 00:52:13,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.374e+01 2.607e+01 2.976e+01 8.538e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-14 00:52:17,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2400430.0, ans=0.2 2024-08-14 00:52:33,411 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 00:52:40,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2400630.0, ans=0.125 2024-08-14 00:52:46,813 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 00:52:49,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8200, loss[loss=0.09243, beats_loss=0.009569, ecapa_loss=0.0002038, whisper_loss=0.08082, over 16389.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.0001609, whisper_loss=0.09137, over 3879646.29 frames. ], batch size: 65, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:53:00,489 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 00:53:03,084 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 00:53:19,884 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 00:53:27,943 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 00:53:45,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2401030.0, ans=0.125 2024-08-14 00:53:54,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2401130.0, ans=0.125 2024-08-14 00:54:05,523 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-14 00:54:06,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8250, loss[loss=0.09071, beats_loss=0.01287, ecapa_loss=0.0001198, whisper_loss=0.07664, over 15716.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01066, ecapa_loss=0.0001598, whisper_loss=0.09138, over 3890943.79 frames. ], batch size: 62, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:54:14,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2401230.0, ans=0.125 2024-08-14 00:54:40,776 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 00:54:43,466 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 00:54:46,417 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.415e+01 2.692e+01 3.047e+01 4.213e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-14 00:54:48,120 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 00:54:53,252 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 00:55:26,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8300, loss[loss=0.08883, beats_loss=0.009967, ecapa_loss=0.0002033, whisper_loss=0.07683, over 14846.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001589, whisper_loss=0.09123, over 3897233.11 frames. ], batch size: 62, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:55:28,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2401730.0, ans=0.125 2024-08-14 00:55:38,391 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 00:55:42,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2401830.0, ans=0.0 2024-08-14 00:55:54,041 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.706e-03 2024-08-14 00:56:15,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2402030.0, ans=0.125 2024-08-14 00:56:18,270 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 00:56:38,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2402130.0, ans=0.1 2024-08-14 00:56:52,029 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8350, loss[loss=0.09296, beats_loss=0.01166, ecapa_loss=0.0001233, whisper_loss=0.08007, over 22248.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001608, whisper_loss=0.09067, over 3904339.11 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:56:58,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-14 00:57:05,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-14 00:57:17,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2402330.0, ans=0.0 2024-08-14 00:57:26,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2402430.0, ans=0.125 2024-08-14 00:57:36,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.282e+01 2.635e+01 3.067e+01 5.691e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 00:57:37,012 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 00:58:04,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2402630.0, ans=0.1 2024-08-14 00:58:11,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2402630.0, ans=0.125 2024-08-14 00:58:18,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8400, loss[loss=0.09935, beats_loss=0.00956, ecapa_loss=0.0001628, whisper_loss=0.08816, over 14394.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001618, whisper_loss=0.09126, over 3941113.69 frames. ], batch size: 56, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:58:35,206 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 00:58:36,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2402830.0, ans=0.0 2024-08-14 00:58:47,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2402830.0, ans=0.0 2024-08-14 00:58:59,804 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 00:59:24,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 00:59:42,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8450, loss[loss=0.09491, beats_loss=0.0116, ecapa_loss=0.0001346, whisper_loss=0.08196, over 14183.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001603, whisper_loss=0.09152, over 3933378.06 frames. ], batch size: 56, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:59:48,056 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 01:00:26,981 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.360e+01 2.603e+01 2.918e+01 4.445e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 01:00:43,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2403530.0, ans=0.0 2024-08-14 01:00:44,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2403530.0, ans=0.0 2024-08-14 01:00:45,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2403530.0, ans=0.1 2024-08-14 01:01:08,677 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8500, loss[loss=0.09707, beats_loss=0.01099, ecapa_loss=0.0001644, whisper_loss=0.08443, over 22697.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01069, ecapa_loss=0.0001591, whisper_loss=0.09167, over 3933269.69 frames. ], batch size: 92, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:01:09,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2403730.0, ans=0.1 2024-08-14 01:01:34,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2403830.0, ans=0.1 2024-08-14 01:01:49,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2403930.0, ans=0.2 2024-08-14 01:01:54,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2403930.0, ans=0.0 2024-08-14 01:02:32,086 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 01:02:33,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8550, loss[loss=0.0918, beats_loss=0.01198, ecapa_loss=0.0001426, whisper_loss=0.07839, over 17582.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001584, whisper_loss=0.0911, over 3903764.56 frames. ], batch size: 68, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:02:36,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-14 01:02:41,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2404230.0, ans=0.125 2024-08-14 01:02:55,906 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 01:03:00,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2404330.0, ans=0.0 2024-08-14 01:03:13,270 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 01:03:18,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.347e+01 2.626e+01 2.928e+01 4.701e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 01:03:52,195 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 01:04:02,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8600, loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001881, whisper_loss=0.08988, over 14306.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001593, whisper_loss=0.09069, over 3916342.45 frames. ], batch size: 57, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:04:06,581 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 01:04:08,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2404730.0, ans=0.0 2024-08-14 01:04:30,097 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-14 01:04:40,659 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 01:05:02,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2405030.0, ans=0.04949747468305833 2024-08-14 01:05:09,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2405030.0, ans=0.2 2024-08-14 01:05:29,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8650, loss[loss=0.09383, beats_loss=0.009037, ecapa_loss=0.0001866, whisper_loss=0.08293, over 16079.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001601, whisper_loss=0.09095, over 3889810.36 frames. ], batch size: 66, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:05:29,859 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 01:05:33,471 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 01:05:37,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2405230.0, ans=0.2 2024-08-14 01:05:39,310 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 01:05:40,796 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 01:05:47,322 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 25 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-14 01:06:10,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2405430.0, ans=0.1 2024-08-14 01:06:14,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2405430.0, ans=0.125 2024-08-14 01:06:15,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.307e+01 2.549e+01 2.918e+01 2.030e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-14 01:06:27,842 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 01:06:38,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2405630.0, ans=0.1 2024-08-14 01:06:49,333 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 01:06:57,243 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 01:06:58,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8700, loss[loss=0.104, beats_loss=0.01138, ecapa_loss=0.0001316, whisper_loss=0.09126, over 22927.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001601, whisper_loss=0.09124, over 3892332.45 frames. ], batch size: 89, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:07:01,182 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 01:07:33,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2405930.0, ans=0.125 2024-08-14 01:07:34,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2405930.0, ans=0.125 2024-08-14 01:07:35,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2405930.0, ans=0.125 2024-08-14 01:07:36,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-14 01:08:05,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2406130.0, ans=0.125 2024-08-14 01:08:07,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2406130.0, ans=0.1 2024-08-14 01:08:07,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2406130.0, ans=0.0 2024-08-14 01:08:22,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8750, loss[loss=0.1215, beats_loss=0.008801, ecapa_loss=0.0001573, whisper_loss=0.1111, over 22098.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001607, whisper_loss=0.09137, over 3878841.17 frames. ], batch size: 86, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:08:22,386 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 18 from LS+wenet, 30 from Vox, 47 fro AS 2024-08-14 01:08:36,481 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 01:08:56,738 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 01:08:57,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2406330.0, ans=0.2 2024-08-14 01:09:11,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2406430.0, ans=0.1 2024-08-14 01:09:14,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.356e+01 2.644e+01 3.033e+01 3.229e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 01:09:15,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2406430.0, ans=0.04949747468305833 2024-08-14 01:09:27,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2406530.0, ans=0.125 2024-08-14 01:10:05,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8800, loss[loss=0.08449, beats_loss=0.01155, ecapa_loss=0.0001494, whisper_loss=0.07144, over 18849.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001594, whisper_loss=0.09046, over 3862835.32 frames. ], batch size: 73, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:10:07,903 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 01:10:09,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2406730.0, ans=0.09899494936611666 2024-08-14 01:10:17,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2406730.0, ans=0.0 2024-08-14 01:10:28,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2406830.0, ans=0.1 2024-08-14 01:10:28,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2406830.0, ans=0.07 2024-08-14 01:10:28,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-14 01:10:35,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2406830.0, ans=0.1 2024-08-14 01:11:03,109 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 01:11:19,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2407030.0, ans=0.125 2024-08-14 01:11:30,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2407130.0, ans=0.125 2024-08-14 01:11:52,617 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8850, loss[loss=0.08913, beats_loss=0.0146, ecapa_loss=0.0001167, whisper_loss=0.07336, over 23267.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01083, ecapa_loss=0.000159, whisper_loss=0.08995, over 3882968.02 frames. ], batch size: 93, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:12:21,903 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.984e-02 2024-08-14 01:12:26,613 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 01:12:31,330 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 17 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 01:12:33,609 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 01:12:40,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2407430.0, ans=0.2 2024-08-14 01:12:44,686 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 01:12:50,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2407430.0, ans=0.025 2024-08-14 01:12:50,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-14 01:12:53,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.365e+01 2.669e+01 3.063e+01 4.484e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-14 01:12:58,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2407430.0, ans=0.125 2024-08-14 01:13:31,473 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 01:13:37,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2407630.0, ans=0.125 2024-08-14 01:13:43,496 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 01:13:45,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8900, loss[loss=0.112, beats_loss=0.009789, ecapa_loss=0.0001658, whisper_loss=0.1005, over 20437.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001595, whisper_loss=0.09051, over 3889609.01 frames. ], batch size: 81, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:13:59,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2407730.0, ans=0.1 2024-08-14 01:14:40,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2407930.0, ans=0.2 2024-08-14 01:14:41,435 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 01:14:44,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 01:15:08,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2408130.0, ans=0.0 2024-08-14 01:15:17,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2408130.0, ans=0.125 2024-08-14 01:15:26,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 8950, loss[loss=0.08871, beats_loss=0.01022, ecapa_loss=0.0001408, whisper_loss=0.07708, over 13870.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001599, whisper_loss=0.09014, over 3854832.87 frames. ], batch size: 55, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:15:27,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2408230.0, ans=0.2 2024-08-14 01:16:06,817 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 01:16:15,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.300e+01 2.488e+01 2.810e+01 4.417e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-14 01:16:16,811 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 01:16:18,682 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 01:16:39,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2408630.0, ans=0.0 2024-08-14 01:16:45,178 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 01:16:50,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-14 01:16:50,479 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9000, loss[loss=0.1026, beats_loss=0.01194, ecapa_loss=0.0001539, whisper_loss=0.08915, over 21456.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001594, whisper_loss=0.09048, over 3867432.17 frames. ], batch size: 87, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:16:50,480 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 01:17:30,883 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0592, 1.4256, 2.9327, 2.7091], device='cuda:2') 2024-08-14 01:17:32,616 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005618, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 01:17:46,290 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1534, 2.0829, 2.1398, 2.5933], device='cuda:2') 2024-08-14 01:17:50,460 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on SV_voxceleb1: loss=0.004363, beats_loss=0, ecapa_loss=0.0004363, whisper_loss=0, over 939242.00 frames. 2024-08-14 01:20:00,333 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on AT_audioset: loss=0.02365, beats_loss=0.02365, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 01:20:00,337 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 01:20:00,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2408730.0, ans=0.2 2024-08-14 01:20:00,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2408730.0, ans=0.1 2024-08-14 01:20:02,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2408730.0, ans=0.04949747468305833 2024-08-14 01:20:05,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2024-08-14 01:20:15,700 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 01:20:19,291 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 01:20:24,613 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 01:20:47,434 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-14 01:20:50,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2409030.0, ans=0.1 2024-08-14 01:20:51,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2409030.0, ans=0.0 2024-08-14 01:20:58,612 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 01:21:05,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2409130.0, ans=0.1 2024-08-14 01:21:13,266 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9050, loss[loss=0.09646, beats_loss=0.01047, ecapa_loss=0.000186, whisper_loss=0.08412, over 21980.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001589, whisper_loss=0.09047, over 3874791.70 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:21:40,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2409330.0, ans=0.0 2024-08-14 01:21:46,291 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 01:21:52,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.446e+01 2.670e+01 2.988e+01 4.436e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 01:21:57,023 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 01:22:10,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2409530.0, ans=0.125 2024-08-14 01:22:12,436 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 01:22:15,564 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 01:22:28,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9100, loss[loss=0.1002, beats_loss=0.009097, ecapa_loss=0.0001475, whisper_loss=0.08965, over 16176.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.00016, whisper_loss=0.09038, over 3867873.23 frames. ], batch size: 63, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:22:29,271 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 01:22:55,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2409830.0, ans=0.0 2024-08-14 01:23:03,113 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 01:23:08,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2024-08-14 01:23:15,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-08-14 01:23:19,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410030.0, ans=0.125 2024-08-14 01:23:22,136 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 01:23:22,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2410030.0, ans=0.2 2024-08-14 01:23:36,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2410130.0, ans=0.125 2024-08-14 01:23:48,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9150, loss[loss=0.09661, beats_loss=0.01188, ecapa_loss=0.0001455, whisper_loss=0.08328, over 17146.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.00016, whisper_loss=0.09006, over 3870614.49 frames. ], batch size: 68, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:23:54,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410230.0, ans=0.125 2024-08-14 01:23:57,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2410230.0, ans=0.1 2024-08-14 01:24:09,216 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 01:24:18,587 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 01:24:20,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2410430.0, ans=0.1 2024-08-14 01:24:29,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.433e+01 2.654e+01 2.886e+01 8.462e+01, threshold=5.308e+01, percent-clipped=1.0 2024-08-14 01:24:39,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2410530.0, ans=0.0 2024-08-14 01:24:47,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2410530.0, ans=0.1 2024-08-14 01:24:55,826 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:24:58,097 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 15 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 01:25:00,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2410630.0, ans=0.1 2024-08-14 01:25:07,042 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 01:25:08,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9200, loss[loss=0.09819, beats_loss=0.009531, ecapa_loss=0.000178, whisper_loss=0.08687, over 17069.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.0001602, whisper_loss=0.08981, over 3873878.62 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:25:40,030 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 01:25:47,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2410930.0, ans=0.0 2024-08-14 01:25:56,100 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 01:25:59,667 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 01:26:10,127 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 01:26:14,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-08-14 01:26:15,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2411130.0, ans=0.125 2024-08-14 01:26:30,177 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9250, loss[loss=0.106, beats_loss=0.009811, ecapa_loss=0.0001631, whisper_loss=0.09453, over 21022.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001621, whisper_loss=0.09007, over 3889367.01 frames. ], batch size: 87, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:26:31,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2411230.0, ans=0.2 2024-08-14 01:26:35,093 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 01:26:42,874 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 01:26:43,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2411230.0, ans=0.0 2024-08-14 01:26:43,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2411230.0, ans=0.125 2024-08-14 01:26:43,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2411230.0, ans=0.2 2024-08-14 01:27:05,753 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 01:27:10,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=12.0 2024-08-14 01:27:10,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.291e+01 2.608e+01 2.884e+01 5.366e+01, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 01:27:28,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2411530.0, ans=0.2 2024-08-14 01:27:32,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2411630.0, ans=0.1 2024-08-14 01:27:39,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2411630.0, ans=0.2 2024-08-14 01:27:40,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2411630.0, ans=0.05 2024-08-14 01:27:49,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9300, loss[loss=0.06887, beats_loss=0.01, ecapa_loss=0.0001679, whisper_loss=0.05719, over 14599.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001616, whisper_loss=0.09072, over 3889735.10 frames. ], batch size: 55, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:27:59,047 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-14 01:28:04,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2411830.0, ans=0.1 2024-08-14 01:28:15,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2411830.0, ans=0.125 2024-08-14 01:28:16,675 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 01:28:29,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2411930.0, ans=0.125 2024-08-14 01:28:37,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2412030.0, ans=0.125 2024-08-14 01:28:40,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2412030.0, ans=0.0 2024-08-14 01:28:50,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2412030.0, ans=10.0 2024-08-14 01:28:53,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2412130.0, ans=0.125 2024-08-14 01:28:59,054 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 22 from Vox, 13 fro AS 2024-08-14 01:29:00,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2412130.0, ans=0.05 2024-08-14 01:29:01,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2412130.0, ans=0.125 2024-08-14 01:29:07,613 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9350, loss[loss=0.09303, beats_loss=0.01279, ecapa_loss=0.0001473, whisper_loss=0.07877, over 22485.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.000162, whisper_loss=0.09086, over 3893528.76 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:29:09,169 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 01:29:18,239 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 01:29:18,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2412230.0, ans=0.125 2024-08-14 01:29:28,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2412330.0, ans=0.09899494936611666 2024-08-14 01:29:34,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-08-14 01:29:47,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.279e+01 2.558e+01 2.915e+01 7.467e+01, threshold=5.116e+01, percent-clipped=2.0 2024-08-14 01:30:01,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2412530.0, ans=0.1 2024-08-14 01:30:20,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2412630.0, ans=0.125 2024-08-14 01:30:23,175 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 01:30:26,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9400, loss[loss=0.1066, beats_loss=0.008499, ecapa_loss=0.0001889, whisper_loss=0.09625, over 19720.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001624, whisper_loss=0.09145, over 3892243.69 frames. ], batch size: 79, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:30:27,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2412730.0, ans=0.125 2024-08-14 01:30:30,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.59 vs. limit=22.5 2024-08-14 01:30:38,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2412730.0, ans=0.04949747468305833 2024-08-14 01:30:48,544 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 01:30:52,150 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 01:30:55,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2412830.0, ans=0.0 2024-08-14 01:31:03,567 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 01:31:05,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2412930.0, ans=0.2 2024-08-14 01:31:20,037 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 01:31:21,480 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 10 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 01:31:29,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-08-14 01:31:39,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2024-08-14 01:31:42,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2413130.0, ans=0.125 2024-08-14 01:31:48,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9450, loss[loss=0.09304, beats_loss=0.008676, ecapa_loss=0.0001705, whisper_loss=0.08266, over 15655.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001621, whisper_loss=0.09119, over 3901275.41 frames. ], batch size: 63, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:32:02,678 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 01:32:04,207 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 01:32:07,241 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 01:32:30,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2413430.0, ans=0.125 2024-08-14 01:32:35,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.507e+01 2.797e+01 3.259e+01 9.131e+01, threshold=5.593e+01, percent-clipped=2.0 2024-08-14 01:32:37,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2413430.0, ans=0.125 2024-08-14 01:33:16,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9500, loss[loss=0.09849, beats_loss=0.007795, ecapa_loss=0.0001659, whisper_loss=0.08904, over 19381.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001623, whisper_loss=0.09053, over 3895282.99 frames. ], batch size: 74, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:33:30,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-14 01:33:38,968 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 01:33:39,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2413830.0, ans=0.125 2024-08-14 01:33:41,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2413830.0, ans=0.1 2024-08-14 01:33:51,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2413930.0, ans=0.1 2024-08-14 01:33:54,332 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 01:34:36,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9550, loss[loss=0.09859, beats_loss=0.009155, ecapa_loss=0.0001756, whisper_loss=0.08768, over 15620.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001614, whisper_loss=0.09033, over 3910052.57 frames. ], batch size: 63, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:34:43,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2414230.0, ans=0.125 2024-08-14 01:34:44,824 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 01:35:05,387 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 01:35:08,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2414430.0, ans=0.125 2024-08-14 01:35:11,381 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 01:35:17,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.395e+01 2.666e+01 3.161e+01 6.328e+01, threshold=5.331e+01, percent-clipped=1.0 2024-08-14 01:35:23,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2414530.0, ans=0.125 2024-08-14 01:35:24,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.53 vs. limit=10.0 2024-08-14 01:35:25,868 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 01:35:26,967 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 01:35:27,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2414530.0, ans=0.0 2024-08-14 01:35:29,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-14 01:35:39,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2414630.0, ans=0.2 2024-08-14 01:35:43,887 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 01:35:47,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2414630.0, ans=0.0 2024-08-14 01:35:57,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9600, loss[loss=0.08699, beats_loss=0.008642, ecapa_loss=0.0001741, whisper_loss=0.07661, over 15837.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001608, whisper_loss=0.09075, over 3882475.10 frames. ], batch size: 62, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:36:02,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2414730.0, ans=0.125 2024-08-14 01:36:11,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2414730.0, ans=0.125 2024-08-14 01:36:16,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2414830.0, ans=0.0 2024-08-14 01:36:20,688 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 01:36:29,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=12.0 2024-08-14 01:36:40,215 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 01:36:45,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2414930.0, ans=0.09899494936611666 2024-08-14 01:36:45,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2024-08-14 01:36:51,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-14 01:37:01,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2024-08-14 01:37:26,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9650, loss[loss=0.0969, beats_loss=0.0113, ecapa_loss=0.0001118, whisper_loss=0.08448, over 16175.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001602, whisper_loss=0.09042, over 3864441.03 frames. ], batch size: 61, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:37:31,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2415230.0, ans=0.125 2024-08-14 01:37:33,827 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 01:37:35,417 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 01:37:46,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=22.5 2024-08-14 01:37:51,521 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-14 01:38:09,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.345e+01 2.616e+01 2.966e+01 4.263e+01, threshold=5.231e+01, percent-clipped=0.0 2024-08-14 01:38:22,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2415530.0, ans=0.125 2024-08-14 01:38:25,144 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 11 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 01:38:48,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2415730.0, ans=0.0 2024-08-14 01:38:49,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9700, loss[loss=0.08184, beats_loss=0.009973, ecapa_loss=0.0001939, whisper_loss=0.06993, over 15369.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.000161, whisper_loss=0.09057, over 3869714.22 frames. ], batch size: 62, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:38:58,637 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 01:39:01,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2415730.0, ans=0.125 2024-08-14 01:39:02,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2415730.0, ans=0.125 2024-08-14 01:39:15,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2415830.0, ans=10.0 2024-08-14 01:39:17,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2415830.0, ans=0.035 2024-08-14 01:39:41,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2416030.0, ans=0.125 2024-08-14 01:39:44,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2416030.0, ans=0.125 2024-08-14 01:39:49,108 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.136e-01 2024-08-14 01:39:53,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2024-08-14 01:39:55,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2416130.0, ans=0.07 2024-08-14 01:40:09,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2416230.0, ans=0.125 2024-08-14 01:40:10,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9750, loss[loss=0.1097, beats_loss=0.01156, ecapa_loss=0.0001455, whisper_loss=0.0967, over 23735.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001605, whisper_loss=0.09026, over 3883297.04 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:40:13,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2416230.0, ans=0.2 2024-08-14 01:40:15,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2024-08-14 01:40:23,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2416230.0, ans=0.2 2024-08-14 01:40:37,595 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 01:40:41,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2416430.0, ans=0.125 2024-08-14 01:40:49,426 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 01:40:51,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.366e+01 2.693e+01 3.078e+01 7.887e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 01:40:51,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2416430.0, ans=0.07 2024-08-14 01:40:54,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2416430.0, ans=0.125 2024-08-14 01:41:15,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2416630.0, ans=0.0 2024-08-14 01:41:15,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2416630.0, ans=0.0 2024-08-14 01:41:26,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9800, loss[loss=0.1002, beats_loss=0.01161, ecapa_loss=0.0001384, whisper_loss=0.08721, over 22483.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001604, whisper_loss=0.09068, over 3875403.12 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:41:36,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2416730.0, ans=0.125 2024-08-14 01:41:44,493 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 01:41:44,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2416830.0, ans=0.125 2024-08-14 01:41:46,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2416830.0, ans=0.0 2024-08-14 01:41:54,863 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 01:42:01,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2416930.0, ans=0.125 2024-08-14 01:42:12,009 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 01:42:37,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9850, loss[loss=0.1195, beats_loss=0.01001, ecapa_loss=0.0002013, whisper_loss=0.1074, over 21340.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001594, whisper_loss=0.09029, over 3856262.21 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:42:54,669 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 01:43:04,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2417430.0, ans=0.0 2024-08-14 01:43:06,301 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 01:43:09,092 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 01:43:11,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.319e+01 2.530e+01 2.883e+01 5.906e+01, threshold=5.059e+01, percent-clipped=1.0 2024-08-14 01:43:18,535 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 01:43:27,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2417530.0, ans=0.125 2024-08-14 01:43:37,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2417630.0, ans=0.125 2024-08-14 01:43:39,203 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 01:43:44,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9900, loss[loss=0.1066, beats_loss=0.008068, ecapa_loss=0.0002313, whisper_loss=0.09621, over 17556.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001598, whisper_loss=0.09047, over 3870790.53 frames. ], batch size: 76, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:43:46,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2024-08-14 01:43:47,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2417730.0, ans=0.125 2024-08-14 01:43:58,045 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 01:44:04,816 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 01:44:19,705 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-14 01:44:21,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2417930.0, ans=0.0 2024-08-14 01:44:21,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2417930.0, ans=0.0 2024-08-14 01:44:35,080 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 01:44:36,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2418030.0, ans=0.125 2024-08-14 01:44:52,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 9950, loss[loss=0.09416, beats_loss=0.0114, ecapa_loss=0.0001573, whisper_loss=0.08119, over 17391.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001607, whisper_loss=0.0906, over 3880922.75 frames. ], batch size: 68, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:45:26,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.412e+01 2.652e+01 3.138e+01 4.371e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-14 01:45:29,684 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-14 01:45:41,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2418530.0, ans=0.125 2024-08-14 01:45:59,854 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10000, loss[loss=0.1032, beats_loss=0.008895, ecapa_loss=0.0001649, whisper_loss=0.09262, over 18337.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.00016, whisper_loss=0.09107, over 3881459.24 frames. ], batch size: 71, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:46:05,311 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 01:46:19,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-08-14 01:46:20,323 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 01:46:35,941 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 01:46:46,553 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-14 01:46:54,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2419130.0, ans=0.125 2024-08-14 01:47:02,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-14 01:47:06,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10050, loss[loss=0.1013, beats_loss=0.01197, ecapa_loss=0.0001513, whisper_loss=0.08779, over 22611.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001598, whisper_loss=0.09069, over 3876905.97 frames. ], batch size: 94, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:47:07,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2419230.0, ans=0.1 2024-08-14 01:47:11,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2419230.0, ans=0.1 2024-08-14 01:47:22,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2419330.0, ans=0.125 2024-08-14 01:47:24,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2419330.0, ans=0.1 2024-08-14 01:47:29,635 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 01:47:32,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2419430.0, ans=0.125 2024-08-14 01:47:36,436 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 01:47:38,041 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:47:42,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.384e+01 2.686e+01 2.960e+01 2.282e+02, threshold=5.371e+01, percent-clipped=3.0 2024-08-14 01:47:47,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2419530.0, ans=0.09899494936611666 2024-08-14 01:47:49,707 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 01:48:14,342 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10100, loss[loss=0.09571, beats_loss=0.01146, ecapa_loss=0.0001566, whisper_loss=0.08269, over 21801.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001598, whisper_loss=0.09069, over 3871766.97 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:48:17,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.55 vs. limit=22.5 2024-08-14 01:48:24,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2419730.0, ans=0.0 2024-08-14 01:48:31,343 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:48:53,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2419930.0, ans=0.2 2024-08-14 01:49:14,903 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.188e-03 2024-08-14 01:49:24,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10150, loss[loss=0.1194, beats_loss=0.01052, ecapa_loss=0.00012, whisper_loss=0.1077, over 22482.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001608, whisper_loss=0.0914, over 3869312.25 frames. ], batch size: 83, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:49:57,199 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 01:50:05,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.645e+01 2.951e+01 4.259e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-14 01:50:09,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-14 01:50:15,792 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 01:50:28,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2420630.0, ans=0.125 2024-08-14 01:50:41,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2420630.0, ans=0.125 2024-08-14 01:50:42,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2420730.0, ans=0.125 2024-08-14 01:50:43,660 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10200, loss[loss=0.1101, beats_loss=0.01148, ecapa_loss=0.0001528, whisper_loss=0.09709, over 22517.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.000161, whisper_loss=0.09141, over 3859178.14 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:50:43,771 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 01:50:48,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2420730.0, ans=0.0 2024-08-14 01:50:49,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=22.5 2024-08-14 01:51:15,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2420830.0, ans=0.0 2024-08-14 01:51:15,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2420830.0, ans=0.1 2024-08-14 01:51:22,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2420930.0, ans=0.125 2024-08-14 01:51:36,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-14 01:51:58,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2421130.0, ans=0.015 2024-08-14 01:52:01,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2421130.0, ans=0.125 2024-08-14 01:52:08,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2421130.0, ans=0.0 2024-08-14 01:52:13,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10250, loss[loss=0.1102, beats_loss=0.01052, ecapa_loss=0.0001556, whisper_loss=0.09812, over 23359.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.0001608, whisper_loss=0.09119, over 3853404.77 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:52:37,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2421330.0, ans=0.0 2024-08-14 01:52:41,083 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 01:52:55,666 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 01:53:01,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.474e+01 2.733e+01 3.124e+01 2.948e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-14 01:53:11,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2421530.0, ans=0.125 2024-08-14 01:53:27,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2421630.0, ans=0.0 2024-08-14 01:53:30,467 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 01:53:41,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2421730.0, ans=0.125 2024-08-14 01:53:42,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10300, loss[loss=0.08721, beats_loss=0.009935, ecapa_loss=0.0001784, whisper_loss=0.07549, over 17453.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001608, whisper_loss=0.09076, over 3832926.91 frames. ], batch size: 71, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:53:44,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 01:53:48,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2421730.0, ans=0.1 2024-08-14 01:54:25,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2421930.0, ans=0.09899494936611666 2024-08-14 01:54:57,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2422130.0, ans=0.0 2024-08-14 01:55:01,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2422130.0, ans=0.0 2024-08-14 01:55:06,904 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 15 from LS+wenet, 27 from Vox, 50 fro AS 2024-08-14 01:55:11,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2024-08-14 01:55:11,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10350, loss[loss=0.1233, beats_loss=0.007708, ecapa_loss=0.0001654, whisper_loss=0.1139, over 24024.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001607, whisper_loss=0.09015, over 3852889.49 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:55:18,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2422230.0, ans=0.125 2024-08-14 01:55:56,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.357e+01 2.601e+01 3.091e+01 4.779e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-14 01:55:58,277 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 01:55:58,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2422430.0, ans=0.125 2024-08-14 01:56:05,711 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 01:56:11,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2422530.0, ans=0.0 2024-08-14 01:56:14,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2422530.0, ans=0.125 2024-08-14 01:56:20,170 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 01:56:24,758 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 01:56:32,537 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10400, loss[loss=0.08616, beats_loss=0.01397, ecapa_loss=0.0001229, whisper_loss=0.07095, over 22011.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001606, whisper_loss=0.09021, over 3858646.23 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:56:36,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2422730.0, ans=0.0 2024-08-14 01:56:50,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2422830.0, ans=0.0 2024-08-14 01:57:01,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2422930.0, ans=0.125 2024-08-14 01:57:04,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2422930.0, ans=0.025 2024-08-14 01:57:07,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2422930.0, ans=0.125 2024-08-14 01:57:36,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-14 01:57:41,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10450, loss[loss=0.1077, beats_loss=0.01068, ecapa_loss=0.000145, whisper_loss=0.09558, over 24267.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001594, whisper_loss=0.09059, over 3874225.77 frames. ], batch size: 95, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:57:47,256 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 01:57:53,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2423230.0, ans=0.2 2024-08-14 01:58:02,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2423330.0, ans=0.0 2024-08-14 01:58:10,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2423430.0, ans=0.05 2024-08-14 01:58:14,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2423430.0, ans=0.125 2024-08-14 01:58:16,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.463e+01 2.702e+01 3.082e+01 4.541e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-14 01:58:28,916 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 01:58:45,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2423630.0, ans=0.125 2024-08-14 01:58:45,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2423630.0, ans=6.0 2024-08-14 01:58:47,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10500, loss[loss=0.1214, beats_loss=0.00925, ecapa_loss=0.0001803, whisper_loss=0.1103, over 21135.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001594, whisper_loss=0.09017, over 3870885.25 frames. ], batch size: 86, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:59:23,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2423930.0, ans=0.125 2024-08-14 01:59:52,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10550, loss[loss=0.08448, beats_loss=0.01013, ecapa_loss=0.000184, whisper_loss=0.07252, over 13776.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001601, whisper_loss=0.08999, over 3867411.95 frames. ], batch size: 55, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:00:21,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 02:00:24,615 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 02:00:26,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2424430.0, ans=0.125 2024-08-14 02:00:27,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.316e+01 2.599e+01 2.857e+01 9.329e+01, threshold=5.198e+01, percent-clipped=3.0 2024-08-14 02:00:29,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2424430.0, ans=15.0 2024-08-14 02:00:32,526 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 02:00:38,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2424530.0, ans=0.05 2024-08-14 02:00:38,391 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-14 02:00:49,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2424630.0, ans=0.0 2024-08-14 02:00:57,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10600, loss[loss=0.1033, beats_loss=0.01539, ecapa_loss=9.947e-05, whisper_loss=0.08695, over 18089.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001598, whisper_loss=0.09045, over 3901589.72 frames. ], batch size: 71, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:01:09,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2424830.0, ans=0.125 2024-08-14 02:01:09,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.44 vs. limit=10.0 2024-08-14 02:01:24,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2424930.0, ans=0.0 2024-08-14 02:01:47,372 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-14 02:01:51,522 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 02:01:53,940 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 18 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 02:02:01,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10650, loss[loss=0.123, beats_loss=0.007601, ecapa_loss=0.0001846, whisper_loss=0.1136, over 16862.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001581, whisper_loss=0.09041, over 3885069.87 frames. ], batch size: 68, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:02:04,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2425230.0, ans=0.0 2024-08-14 02:02:08,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.77 vs. limit=10.0 2024-08-14 02:02:16,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2425330.0, ans=0.0 2024-08-14 02:02:23,983 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 02:02:25,412 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 02:02:28,022 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 02:02:29,240 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 02:02:37,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.364e+01 2.670e+01 2.895e+01 4.194e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 02:02:44,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2425530.0, ans=0.0 2024-08-14 02:02:59,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2425630.0, ans=0.2 2024-08-14 02:03:07,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10700, loss[loss=0.1148, beats_loss=0.009254, ecapa_loss=0.0001344, whisper_loss=0.1042, over 23617.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001568, whisper_loss=0.09081, over 3879015.33 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:03:07,615 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-14 02:03:14,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2024-08-14 02:03:15,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2425730.0, ans=0.125 2024-08-14 02:03:17,947 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 02:03:19,171 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 02:03:20,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2425830.0, ans=0.125 2024-08-14 02:03:42,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2425930.0, ans=0.125 2024-08-14 02:03:47,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2426030.0, ans=0.125 2024-08-14 02:04:00,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2426130.0, ans=0.125 2024-08-14 02:04:13,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10750, loss[loss=0.0958, beats_loss=0.01301, ecapa_loss=0.0001848, whisper_loss=0.08093, over 21448.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001577, whisper_loss=0.09147, over 3872992.40 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:04:19,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2426230.0, ans=0.125 2024-08-14 02:04:25,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2426330.0, ans=0.125 2024-08-14 02:04:49,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.445e+01 2.667e+01 2.966e+01 4.209e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 02:05:09,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2426630.0, ans=0.0 2024-08-14 02:05:11,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2426630.0, ans=0.95 2024-08-14 02:05:20,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10800, loss[loss=0.1079, beats_loss=0.00929, ecapa_loss=0.0001629, whisper_loss=0.09694, over 14665.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001572, whisper_loss=0.0914, over 3912284.28 frames. ], batch size: 58, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:05:22,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2426730.0, ans=0.1 2024-08-14 02:05:29,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2024-08-14 02:05:31,470 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 02:05:31,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2426730.0, ans=0.2 2024-08-14 02:05:44,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2426830.0, ans=0.0 2024-08-14 02:05:56,456 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 02:06:09,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2427030.0, ans=0.0 2024-08-14 02:06:14,492 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 02:06:39,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10850, loss[loss=0.1188, beats_loss=0.007484, ecapa_loss=0.0001608, whisper_loss=0.1097, over 18629.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.000158, whisper_loss=0.09218, over 3916504.32 frames. ], batch size: 68, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:06:57,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2427330.0, ans=0.125 2024-08-14 02:06:58,185 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 02:07:02,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2427330.0, ans=0.125 2024-08-14 02:07:02,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2427330.0, ans=0.0 2024-08-14 02:07:21,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2427430.0, ans=0.2 2024-08-14 02:07:22,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.381e+01 2.677e+01 3.006e+01 4.441e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-14 02:07:27,244 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 02:07:48,679 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 02:07:56,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2427730.0, ans=0.125 2024-08-14 02:07:57,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10900, loss[loss=0.1046, beats_loss=0.01301, ecapa_loss=0.000121, whisper_loss=0.09043, over 19305.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001579, whisper_loss=0.09179, over 3917522.19 frames. ], batch size: 75, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:08:16,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2427830.0, ans=0.0 2024-08-14 02:08:21,291 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 02:08:28,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2427930.0, ans=0.125 2024-08-14 02:08:31,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-14 02:08:46,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2428030.0, ans=0.125 2024-08-14 02:09:06,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2428130.0, ans=0.125 2024-08-14 02:09:09,157 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 10950, loss[loss=0.09116, beats_loss=0.009232, ecapa_loss=0.0001647, whisper_loss=0.08028, over 14587.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001582, whisper_loss=0.09168, over 3924074.95 frames. ], batch size: 57, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:09:23,077 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 02:09:31,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 02:09:41,425 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.427e+01 2024-08-14 02:09:45,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2428430.0, ans=0.125 2024-08-14 02:09:46,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.387e+01 2.678e+01 3.232e+01 4.538e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-14 02:09:54,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2428530.0, ans=0.125 2024-08-14 02:10:09,557 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:10:17,041 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11000, loss[loss=0.0891, beats_loss=0.01363, ecapa_loss=0.0001246, whisper_loss=0.07422, over 15450.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.0001583, whisper_loss=0.09163, over 3909613.69 frames. ], batch size: 59, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:10:19,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2428730.0, ans=0.0 2024-08-14 02:10:22,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 02:10:23,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2024-08-14 02:10:24,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2428730.0, ans=0.5 2024-08-14 02:10:28,043 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 02:10:35,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2428830.0, ans=0.07 2024-08-14 02:10:44,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2428930.0, ans=0.125 2024-08-14 02:10:54,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2428930.0, ans=0.125 2024-08-14 02:10:58,678 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 02:10:59,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2429030.0, ans=0.125 2024-08-14 02:11:04,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-14 02:11:11,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2429130.0, ans=0.125 2024-08-14 02:11:14,184 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 02:11:18,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2429130.0, ans=0.125 2024-08-14 02:11:20,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2429130.0, ans=0.125 2024-08-14 02:11:22,016 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 02:11:23,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11050, loss[loss=0.09891, beats_loss=0.01232, ecapa_loss=0.0001519, whisper_loss=0.08506, over 21877.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001583, whisper_loss=0.09152, over 3934560.60 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:11:28,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2429230.0, ans=0.125 2024-08-14 02:11:30,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2429230.0, ans=0.125 2024-08-14 02:11:46,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2429330.0, ans=0.125 2024-08-14 02:11:54,645 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 02:11:58,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.348e+01 2.595e+01 2.854e+01 6.191e+01, threshold=5.189e+01, percent-clipped=1.0 2024-08-14 02:12:07,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2429530.0, ans=0.0 2024-08-14 02:12:08,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2429530.0, ans=0.0 2024-08-14 02:12:09,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2429530.0, ans=0.0 2024-08-14 02:12:15,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2429630.0, ans=0.04949747468305833 2024-08-14 02:12:15,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-14 02:12:16,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2429630.0, ans=6.0 2024-08-14 02:12:25,931 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 02:12:26,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-14 02:12:27,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2429730.0, ans=0.95 2024-08-14 02:12:28,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11100, loss[loss=0.09967, beats_loss=0.009225, ecapa_loss=0.0001675, whisper_loss=0.08877, over 17266.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001599, whisper_loss=0.09115, over 3900725.43 frames. ], batch size: 66, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:12:36,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2429730.0, ans=0.5 2024-08-14 02:12:37,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2429730.0, ans=0.125 2024-08-14 02:13:00,483 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 02:13:02,977 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 02:13:10,117 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 02:13:11,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2430030.0, ans=0.0 2024-08-14 02:13:26,192 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 02:13:27,454 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 02:13:35,892 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11150, loss[loss=0.1074, beats_loss=0.01012, ecapa_loss=0.0001681, whisper_loss=0.09561, over 22613.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001586, whisper_loss=0.09057, over 3905093.33 frames. ], batch size: 90, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:13:37,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2430230.0, ans=0.0 2024-08-14 02:13:58,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2430330.0, ans=0.1 2024-08-14 02:14:10,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2024-08-14 02:14:12,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.319e+01 2.556e+01 2.861e+01 3.873e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-14 02:14:25,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2430530.0, ans=0.09899494936611666 2024-08-14 02:14:34,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2430630.0, ans=0.0 2024-08-14 02:14:43,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11200, loss[loss=0.09198, beats_loss=0.01244, ecapa_loss=0.0001318, whisper_loss=0.07822, over 15660.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001588, whisper_loss=0.09141, over 3896609.85 frames. ], batch size: 62, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:14:45,342 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 02:14:53,439 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 02:15:15,973 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 02:15:38,752 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 02:15:44,310 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-14 02:15:49,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2431230.0, ans=0.125 2024-08-14 02:15:50,378 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11250, loss[loss=0.1059, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.09379, over 24221.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001588, whisper_loss=0.09145, over 3889585.47 frames. ], batch size: 93, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:15:56,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2431230.0, ans=0.2 2024-08-14 02:16:06,891 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 02:16:07,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2024-08-14 02:16:16,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2431430.0, ans=0.125 2024-08-14 02:16:27,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.401e+01 2.755e+01 3.055e+01 4.281e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-14 02:16:32,985 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 02:16:37,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2431530.0, ans=0.09899494936611666 2024-08-14 02:16:39,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2431530.0, ans=0.04949747468305833 2024-08-14 02:16:43,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2431630.0, ans=0.0 2024-08-14 02:16:51,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-14 02:16:57,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11300, loss[loss=0.08114, beats_loss=0.01027, ecapa_loss=0.0002098, whisper_loss=0.06877, over 17802.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.0001583, whisper_loss=0.09182, over 3871305.89 frames. ], batch size: 76, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:16:59,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2431730.0, ans=0.125 2024-08-14 02:17:23,223 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 02:17:28,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2431930.0, ans=0.125 2024-08-14 02:17:30,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2431930.0, ans=0.125 2024-08-14 02:17:33,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-08-14 02:17:50,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2432130.0, ans=0.1 2024-08-14 02:18:04,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11350, loss[loss=0.108, beats_loss=0.01043, ecapa_loss=0.0001561, whisper_loss=0.09597, over 22221.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001589, whisper_loss=0.09134, over 3879681.68 frames. ], batch size: 88, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:18:15,811 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 02:18:19,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2432330.0, ans=0.125 2024-08-14 02:18:26,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2432330.0, ans=0.125 2024-08-14 02:18:30,431 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 02:18:32,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2432430.0, ans=0.0 2024-08-14 02:18:40,214 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 02:18:41,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.318e+01 2.543e+01 2.878e+01 4.882e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-14 02:18:43,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-14 02:18:48,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2432530.0, ans=0.0 2024-08-14 02:18:51,210 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 02:19:00,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-14 02:19:11,463 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11400, loss[loss=0.1197, beats_loss=0.00852, ecapa_loss=0.0001496, whisper_loss=0.1097, over 24215.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.000158, whisper_loss=0.09129, over 3879644.40 frames. ], batch size: 89, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:19:15,877 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 02:19:42,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-14 02:19:48,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-08-14 02:19:51,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2433030.0, ans=0.09899494936611666 2024-08-14 02:19:52,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2433030.0, ans=0.09899494936611666 2024-08-14 02:19:54,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-14 02:20:18,431 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11450, loss[loss=0.1142, beats_loss=0.01076, ecapa_loss=0.0001473, whisper_loss=0.102, over 14692.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.000157, whisper_loss=0.09115, over 3885748.55 frames. ], batch size: 58, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:20:56,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.469e+01 2.659e+01 2.977e+01 4.724e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 02:20:57,066 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 02:20:57,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2024-08-14 02:21:09,156 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 02:21:15,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2433630.0, ans=0.125 2024-08-14 02:21:16,103 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 02:21:27,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11500, loss[loss=0.08145, beats_loss=0.01093, ecapa_loss=0.000155, whisper_loss=0.06896, over 15907.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001578, whisper_loss=0.09106, over 3895704.31 frames. ], batch size: 65, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:21:37,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2433730.0, ans=0.125 2024-08-14 02:21:44,971 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 02:21:52,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2433830.0, ans=0.125 2024-08-14 02:21:54,469 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 02:22:00,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2433930.0, ans=0.1 2024-08-14 02:22:10,504 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 02:22:13,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2434030.0, ans=0.125 2024-08-14 02:22:20,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-08-14 02:22:25,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2434130.0, ans=0.2 2024-08-14 02:22:34,097 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11550, loss[loss=0.09605, beats_loss=0.01262, ecapa_loss=0.0001442, whisper_loss=0.08199, over 20767.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001589, whisper_loss=0.09198, over 3907543.32 frames. ], batch size: 88, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:22:57,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2434330.0, ans=0.05 2024-08-14 02:23:05,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2434430.0, ans=0.0 2024-08-14 02:23:10,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.439e+01 2.716e+01 3.080e+01 3.847e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-14 02:23:20,359 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 02:23:41,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2434730.0, ans=0.2 2024-08-14 02:23:41,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11600, loss[loss=0.1002, beats_loss=0.01165, ecapa_loss=0.0001251, whisper_loss=0.08731, over 20948.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.0001587, whisper_loss=0.09214, over 3917988.74 frames. ], batch size: 80, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:23:47,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2434730.0, ans=0.0 2024-08-14 02:24:11,553 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 02:24:20,736 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 02:24:27,707 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 02:24:28,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-14 02:24:37,393 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.131e+01 2024-08-14 02:24:37,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2435130.0, ans=0.0 2024-08-14 02:24:47,637 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 02:24:52,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11650, loss[loss=0.1062, beats_loss=0.01076, ecapa_loss=0.0001607, whisper_loss=0.09382, over 20622.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01063, ecapa_loss=0.0001598, whisper_loss=0.09224, over 3926312.47 frames. ], batch size: 81, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:24:54,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2435230.0, ans=0.125 2024-08-14 02:25:18,379 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-14 02:25:32,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.452e+01 2.685e+01 3.038e+01 4.511e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-14 02:25:32,533 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 02:25:40,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2435530.0, ans=0.2 2024-08-14 02:25:44,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2435530.0, ans=0.0 2024-08-14 02:25:58,636 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.277e-02 2024-08-14 02:26:05,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2435730.0, ans=0.1 2024-08-14 02:26:06,690 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11700, loss[loss=0.1077, beats_loss=0.01056, ecapa_loss=0.0001513, whisper_loss=0.09566, over 22649.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.000159, whisper_loss=0.09195, over 3928715.96 frames. ], batch size: 91, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:26:25,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-14 02:26:29,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2435830.0, ans=0.0 2024-08-14 02:26:35,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2435830.0, ans=0.0 2024-08-14 02:26:43,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2435930.0, ans=0.1 2024-08-14 02:27:23,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11750, loss[loss=0.1154, beats_loss=0.00799, ecapa_loss=0.00018, whisper_loss=0.1056, over 23064.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001593, whisper_loss=0.09169, over 3945497.72 frames. ], batch size: 93, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:27:35,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2436230.0, ans=0.125 2024-08-14 02:27:39,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2436330.0, ans=0.1 2024-08-14 02:27:46,993 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 02:27:50,389 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 02:27:51,807 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 02:28:03,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.416e+01 2.659e+01 3.015e+01 1.752e+02, threshold=5.317e+01, percent-clipped=1.0 2024-08-14 02:28:15,671 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 02:28:28,881 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 02:28:36,666 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.098e-02 2024-08-14 02:28:37,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11800, loss[loss=0.1086, beats_loss=0.01017, ecapa_loss=0.0001593, whisper_loss=0.09689, over 17102.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001578, whisper_loss=0.09089, over 3930246.87 frames. ], batch size: 69, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:28:37,511 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 02:29:10,424 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 02:29:11,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2436930.0, ans=0.125 2024-08-14 02:29:20,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2437030.0, ans=0.1 2024-08-14 02:29:31,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2437130.0, ans=0.2 2024-08-14 02:29:37,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2437130.0, ans=0.0 2024-08-14 02:29:46,159 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11850, loss[loss=0.1091, beats_loss=0.008768, ecapa_loss=0.000189, whisper_loss=0.0984, over 19327.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01089, ecapa_loss=0.0001572, whisper_loss=0.09083, over 3945852.87 frames. ], batch size: 80, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:29:46,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2437230.0, ans=0.125 2024-08-14 02:29:48,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2024-08-14 02:29:51,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2437230.0, ans=15.0 2024-08-14 02:29:52,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2437230.0, ans=0.0 2024-08-14 02:29:57,159 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 02:29:57,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2437230.0, ans=0.1 2024-08-14 02:30:11,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2437330.0, ans=0.0 2024-08-14 02:30:23,380 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.460e+01 2.783e+01 3.243e+01 6.982e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-14 02:30:39,611 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 02:30:55,393 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11900, loss[loss=0.1313, beats_loss=0.009346, ecapa_loss=0.000172, whisper_loss=0.1203, over 22972.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001574, whisper_loss=0.091, over 3920155.38 frames. ], batch size: 91, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:30:55,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2437730.0, ans=0.0 2024-08-14 02:31:23,102 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 02:31:38,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2438030.0, ans=0.0 2024-08-14 02:31:48,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2438130.0, ans=0.125 2024-08-14 02:31:50,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2438130.0, ans=0.2 2024-08-14 02:32:03,876 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 11950, loss[loss=0.09377, beats_loss=0.009409, ecapa_loss=0.0001796, whisper_loss=0.08256, over 17713.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.000158, whisper_loss=0.09117, over 3908875.21 frames. ], batch size: 73, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:32:08,442 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 02:32:20,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2024-08-14 02:32:31,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2438430.0, ans=0.0 2024-08-14 02:32:32,774 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 02:32:34,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2438430.0, ans=0.125 2024-08-14 02:32:37,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2438430.0, ans=0.2 2024-08-14 02:32:42,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.634e+01 2.951e+01 4.369e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-14 02:32:44,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-14 02:32:51,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.47 vs. limit=22.5 2024-08-14 02:32:53,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-08-14 02:33:05,311 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 10 from Vox, 49 fro AS 2024-08-14 02:33:14,090 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 02:33:15,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12000, loss[loss=0.09551, beats_loss=0.01023, ecapa_loss=0.0001439, whisper_loss=0.08384, over 17178.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001585, whisper_loss=0.09175, over 3881982.60 frames. ], batch size: 66, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:33:15,492 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 02:34:00,794 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005541, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 02:34:21,470 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on SV_voxceleb1: loss=0.004448, beats_loss=0, ecapa_loss=0.0004448, whisper_loss=0, over 939242.00 frames. 2024-08-14 02:36:27,297 INFO [train_multi_KD3.py:1149] (2/4) Epoch 17, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 02:36:27,301 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 02:36:30,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-14 02:36:56,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2438930.0, ans=0.1 2024-08-14 02:37:05,795 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 02:37:36,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12050, loss[loss=0.1415, beats_loss=0.008067, ecapa_loss=0.0001536, whisper_loss=0.1319, over 15668.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001583, whisper_loss=0.09137, over 3897644.12 frames. ], batch size: 57, lr: 3.65e-03, grad_scale: 1.152921504606847e+18 2024-08-14 02:37:43,015 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-14 02:37:50,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2439330.0, ans=0.1 2024-08-14 02:37:52,872 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 02:37:55,583 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 02:38:01,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2439330.0, ans=0.125 2024-08-14 02:38:02,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=22.5 2024-08-14 02:38:12,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2024-08-14 02:38:14,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.389e+01 2.572e+01 2.864e+01 7.729e+01, threshold=5.144e+01, percent-clipped=2.0 2024-08-14 02:38:44,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12100, loss[loss=0.1194, beats_loss=0.008832, ecapa_loss=0.000174, whisper_loss=0.1089, over 17784.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001586, whisper_loss=0.09164, over 3898354.91 frames. ], batch size: 68, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:39:15,584 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 02:39:29,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-08-14 02:39:37,193 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 02:39:43,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2440030.0, ans=0.0 2024-08-14 02:40:01,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12150, loss[loss=0.09205, beats_loss=0.01084, ecapa_loss=0.000154, whisper_loss=0.07967, over 20533.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.0001608, whisper_loss=0.09185, over 3914189.13 frames. ], batch size: 84, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:40:04,476 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 02:40:12,249 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-14 02:40:44,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.448e+01 2.795e+01 3.138e+01 2.484e+02, threshold=5.590e+01, percent-clipped=2.0 2024-08-14 02:40:48,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2440530.0, ans=0.1 2024-08-14 02:40:56,073 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 02:41:06,647 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-14 02:41:17,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2440730.0, ans=0.0 2024-08-14 02:41:18,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12200, loss[loss=0.09521, beats_loss=0.00966, ecapa_loss=0.0001644, whisper_loss=0.0839, over 16461.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001598, whisper_loss=0.09148, over 3915539.29 frames. ], batch size: 65, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:41:18,376 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 02:41:20,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2440730.0, ans=0.0 2024-08-14 02:41:44,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2440830.0, ans=0.0 2024-08-14 02:41:47,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2024-08-14 02:41:59,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2440930.0, ans=0.0 2024-08-14 02:42:31,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2441230.0, ans=0.125 2024-08-14 02:42:33,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12250, loss[loss=0.08615, beats_loss=0.01293, ecapa_loss=0.0001342, whisper_loss=0.07188, over 22758.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001598, whisper_loss=0.0914, over 3913302.10 frames. ], batch size: 94, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:42:39,686 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 02:42:39,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2441230.0, ans=0.125 2024-08-14 02:42:42,267 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 02:42:49,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2024-08-14 02:43:08,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2441430.0, ans=0.0 2024-08-14 02:43:11,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2441430.0, ans=0.1 2024-08-14 02:43:13,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-08-14 02:43:14,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.529e+01 2.845e+01 3.228e+01 1.360e+02, threshold=5.691e+01, percent-clipped=2.0 2024-08-14 02:43:16,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2024-08-14 02:43:31,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2441630.0, ans=0.0 2024-08-14 02:43:35,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2441630.0, ans=0.04949747468305833 2024-08-14 02:43:37,968 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 02:43:46,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12300, loss[loss=0.1093, beats_loss=0.0121, ecapa_loss=0.0001259, whisper_loss=0.0959, over 14836.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001604, whisper_loss=0.09094, over 3890103.14 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:44:02,666 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 02:44:15,606 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 02:44:18,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2441930.0, ans=0.125 2024-08-14 02:44:22,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2441930.0, ans=0.1 2024-08-14 02:44:30,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2442030.0, ans=0.125 2024-08-14 02:44:46,479 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 02:44:51,088 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.726e-02 2024-08-14 02:44:52,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2442130.0, ans=0.125 2024-08-14 02:44:53,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2442130.0, ans=0.0 2024-08-14 02:44:54,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2442130.0, ans=0.04949747468305833 2024-08-14 02:44:56,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12350, loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001617, whisper_loss=0.08927, over 21480.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001614, whisper_loss=0.09087, over 3890583.49 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:45:09,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2442330.0, ans=0.1 2024-08-14 02:45:34,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.343e+01 2.707e+01 2.893e+01 7.539e+01, threshold=5.413e+01, percent-clipped=2.0 2024-08-14 02:46:02,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2024-08-14 02:46:03,070 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12400, loss[loss=0.09839, beats_loss=0.009894, ecapa_loss=0.0001585, whisper_loss=0.08691, over 18695.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001611, whisper_loss=0.09088, over 3855722.93 frames. ], batch size: 74, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:46:05,886 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 02:46:08,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2442730.0, ans=0.125 2024-08-14 02:46:12,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2442730.0, ans=0.125 2024-08-14 02:46:28,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2442930.0, ans=0.125 2024-08-14 02:46:30,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2442930.0, ans=0.125 2024-08-14 02:46:37,607 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-14 02:46:56,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2443130.0, ans=0.125 2024-08-14 02:47:01,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2443130.0, ans=0.1 2024-08-14 02:47:07,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12450, loss[loss=0.0933, beats_loss=0.01176, ecapa_loss=0.0001734, whisper_loss=0.0798, over 16968.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001612, whisper_loss=0.09096, over 3871861.03 frames. ], batch size: 71, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:47:14,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2443230.0, ans=0.05 2024-08-14 02:47:30,130 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 02:47:34,139 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-14 02:47:34,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2443430.0, ans=0.0 2024-08-14 02:47:39,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2443430.0, ans=0.1 2024-08-14 02:47:44,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.385e+01 2.629e+01 3.074e+01 4.896e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-14 02:47:46,764 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 29 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-14 02:47:48,551 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 02:47:57,196 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 02:48:05,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2443630.0, ans=0.09899494936611666 2024-08-14 02:48:12,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12500, loss[loss=0.1053, beats_loss=0.009721, ecapa_loss=0.0002146, whisper_loss=0.09345, over 20625.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.00016, whisper_loss=0.09141, over 3862845.59 frames. ], batch size: 88, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:48:30,360 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.889e+00 2024-08-14 02:48:35,029 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 02:48:41,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2443930.0, ans=0.125 2024-08-14 02:48:43,713 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 02:48:58,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-08-14 02:49:17,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12550, loss[loss=0.07906, beats_loss=0.01368, ecapa_loss=0.0001166, whisper_loss=0.06422, over 20623.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001587, whisper_loss=0.09111, over 3884421.47 frames. ], batch size: 84, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:49:19,250 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 02:49:20,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2444230.0, ans=0.1 2024-08-14 02:49:25,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2024-08-14 02:49:54,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.347e+01 2.679e+01 3.056e+01 5.301e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-14 02:49:58,541 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 02:50:01,178 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 02:50:11,396 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 02:50:17,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-14 02:50:18,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2444630.0, ans=0.0 2024-08-14 02:50:22,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12600, loss[loss=0.103, beats_loss=0.01292, ecapa_loss=0.000179, whisper_loss=0.08826, over 21839.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001589, whisper_loss=0.0916, over 3895412.72 frames. ], batch size: 92, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:50:37,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2444830.0, ans=0.1 2024-08-14 02:50:49,681 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 02:50:59,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=22.5 2024-08-14 02:50:59,945 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 02:51:02,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-14 02:51:03,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2024-08-14 02:51:09,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2445030.0, ans=0.95 2024-08-14 02:51:23,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2445130.0, ans=0.125 2024-08-14 02:51:27,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12650, loss[loss=0.09372, beats_loss=0.007899, ecapa_loss=0.0001843, whisper_loss=0.08398, over 15682.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001587, whisper_loss=0.09209, over 3873113.99 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:51:27,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2445230.0, ans=0.2 2024-08-14 02:51:30,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2445230.0, ans=0.0 2024-08-14 02:52:01,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2445430.0, ans=0.5 2024-08-14 02:52:03,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.374e+01 2.633e+01 2.976e+01 1.427e+02, threshold=5.265e+01, percent-clipped=1.0 2024-08-14 02:52:28,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-08-14 02:52:29,846 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 02:52:32,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12700, loss[loss=0.09908, beats_loss=0.01121, ecapa_loss=0.000125, whisper_loss=0.08662, over 22544.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01072, ecapa_loss=0.0001593, whisper_loss=0.09231, over 3851195.31 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:52:36,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2445730.0, ans=0.125 2024-08-14 02:52:37,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2445730.0, ans=0.125 2024-08-14 02:52:44,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2445830.0, ans=0.1 2024-08-14 02:52:45,428 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-14 02:52:45,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=12.0 2024-08-14 02:52:57,111 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 02:53:02,756 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 02:53:09,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2445930.0, ans=0.125 2024-08-14 02:53:36,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2446230.0, ans=0.0 2024-08-14 02:53:37,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12750, loss[loss=0.1141, beats_loss=0.01162, ecapa_loss=0.000162, whisper_loss=0.1008, over 21932.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.0001595, whisper_loss=0.09221, over 3846178.64 frames. ], batch size: 90, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:54:07,820 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-14 02:54:14,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.458e+01 2.855e+01 3.170e+01 1.362e+02, threshold=5.709e+01, percent-clipped=3.0 2024-08-14 02:54:15,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2446530.0, ans=0.2 2024-08-14 02:54:25,891 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 02:54:40,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2446630.0, ans=0.05 2024-08-14 02:54:42,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12800, loss[loss=0.1102, beats_loss=0.01072, ecapa_loss=0.0001641, whisper_loss=0.09784, over 23383.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001593, whisper_loss=0.0923, over 3837071.45 frames. ], batch size: 95, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:54:48,893 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-14 02:55:03,357 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 02:55:07,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2446930.0, ans=0.07 2024-08-14 02:55:11,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2446930.0, ans=0.125 2024-08-14 02:55:14,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-14 02:55:15,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2446930.0, ans=0.0 2024-08-14 02:55:19,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-08-14 02:55:42,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2447130.0, ans=0.02 2024-08-14 02:55:45,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2447130.0, ans=0.0 2024-08-14 02:55:47,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12850, loss[loss=0.1041, beats_loss=0.009505, ecapa_loss=0.0001953, whisper_loss=0.09261, over 23205.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001599, whisper_loss=0.09099, over 3860111.42 frames. ], batch size: 93, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:56:07,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2447330.0, ans=0.125 2024-08-14 02:56:10,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2447330.0, ans=0.2 2024-08-14 02:56:17,605 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-14 02:56:23,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.741e+01 3.118e+01 1.301e+02, threshold=5.482e+01, percent-clipped=1.0 2024-08-14 02:56:24,660 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.526e-02 2024-08-14 02:56:33,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2447530.0, ans=0.0 2024-08-14 02:56:41,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2447630.0, ans=0.125 2024-08-14 02:56:42,591 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 02:56:44,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2447630.0, ans=0.0 2024-08-14 02:56:49,335 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 02:56:53,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12900, loss[loss=0.1059, beats_loss=0.01134, ecapa_loss=0.0001543, whisper_loss=0.09304, over 20073.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01083, ecapa_loss=0.0001595, whisper_loss=0.09083, over 3804971.10 frames. ], batch size: 80, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:56:53,233 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 02:56:59,591 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 02:56:59,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2447730.0, ans=0.125 2024-08-14 02:57:12,962 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 02:57:17,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2447830.0, ans=0.1 2024-08-14 02:57:27,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2447930.0, ans=0.0 2024-08-14 02:57:39,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-08-14 02:57:47,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2448130.0, ans=0.125 2024-08-14 02:57:49,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2448130.0, ans=0.125 2024-08-14 02:57:52,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2448130.0, ans=0.1 2024-08-14 02:57:54,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2448130.0, ans=0.0 2024-08-14 02:58:01,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2024-08-14 02:58:01,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 12950, loss[loss=0.1054, beats_loss=0.009852, ecapa_loss=0.0002022, whisper_loss=0.09355, over 22283.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001608, whisper_loss=0.09079, over 3799199.82 frames. ], batch size: 93, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:58:17,949 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 02:58:27,616 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-14 02:58:28,829 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 02:58:37,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-14 02:58:38,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2024-08-14 02:58:41,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.282e+01 2.587e+01 2.877e+01 4.043e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 02:58:49,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2448530.0, ans=0.125 2024-08-14 02:58:55,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2448530.0, ans=0.0 2024-08-14 02:59:00,875 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 02:59:12,065 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13000, loss[loss=0.1198, beats_loss=0.009366, ecapa_loss=0.00016, whisper_loss=0.1088, over 14752.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001592, whisper_loss=0.09076, over 3831110.45 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:59:18,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2448730.0, ans=0.1 2024-08-14 02:59:22,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=22.5 2024-08-14 02:59:41,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2448930.0, ans=0.125 2024-08-14 02:59:47,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2024-08-14 03:00:21,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-14 03:00:26,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2024-08-14 03:00:27,260 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13050, loss[loss=0.1085, beats_loss=0.009895, ecapa_loss=0.000161, whisper_loss=0.097, over 19603.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001606, whisper_loss=0.0909, over 3807888.38 frames. ], batch size: 78, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:00:27,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2449230.0, ans=0.2 2024-08-14 03:01:06,080 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 03:01:09,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2449430.0, ans=0.0 2024-08-14 03:01:12,125 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 03:01:18,466 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.546e+01 2.785e+01 3.142e+01 1.124e+02, threshold=5.570e+01, percent-clipped=2.0 2024-08-14 03:01:22,650 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 03:01:27,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=15.0 2024-08-14 03:01:36,622 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 03:02:03,155 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13100, loss[loss=0.1316, beats_loss=0.00842, ecapa_loss=0.0001972, whisper_loss=0.1212, over 19888.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001606, whisper_loss=0.09055, over 3803138.94 frames. ], batch size: 81, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:02:15,422 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 03:02:16,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-14 03:02:53,971 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 03:03:17,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2450030.0, ans=0.125 2024-08-14 03:03:53,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13150, loss[loss=0.08627, beats_loss=0.01345, ecapa_loss=0.0001626, whisper_loss=0.0712, over 18450.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.0001589, whisper_loss=0.09008, over 3801051.29 frames. ], batch size: 78, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:04:01,712 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 03:04:31,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2450330.0, ans=0.125 2024-08-14 03:04:42,913 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-14 03:04:42,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2450330.0, ans=0.125 2024-08-14 03:04:45,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2450430.0, ans=0.0 2024-08-14 03:04:48,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2450430.0, ans=0.0 2024-08-14 03:04:59,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2450430.0, ans=0.125 2024-08-14 03:05:00,791 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 03:05:09,514 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.343e+01 2.636e+01 2.918e+01 3.888e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 03:05:24,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2450530.0, ans=0.125 2024-08-14 03:05:30,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2450530.0, ans=0.0 2024-08-14 03:05:46,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-14 03:05:48,376 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 03:05:48,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2450630.0, ans=0.0 2024-08-14 03:05:51,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=22.5 2024-08-14 03:06:09,002 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13200, loss[loss=0.1083, beats_loss=0.01216, ecapa_loss=0.0001575, whisper_loss=0.09458, over 23418.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001578, whisper_loss=0.09022, over 3825241.24 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:06:17,983 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 03:06:47,880 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 31 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 03:06:51,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2450830.0, ans=0.0 2024-08-14 03:06:59,321 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 32 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 03:07:32,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2451030.0, ans=0.125 2024-08-14 03:07:32,553 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-14 03:08:09,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-14 03:08:12,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2451130.0, ans=0.125 2024-08-14 03:08:16,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13250, loss[loss=0.1205, beats_loss=0.008388, ecapa_loss=0.0001792, whisper_loss=0.1103, over 21517.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001597, whisper_loss=0.09166, over 3855848.88 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:08:31,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2451230.0, ans=0.125 2024-08-14 03:08:40,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-08-14 03:08:49,259 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 03:09:12,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2451430.0, ans=0.04949747468305833 2024-08-14 03:09:27,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.488e+01 2.774e+01 3.161e+01 6.895e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 03:09:37,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2451530.0, ans=0.0 2024-08-14 03:09:48,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-14 03:10:01,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2451730.0, ans=10.0 2024-08-14 03:10:01,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13300, loss[loss=0.1101, beats_loss=0.01172, ecapa_loss=0.0001498, whisper_loss=0.0969, over 22644.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.00016, whisper_loss=0.09143, over 3826933.56 frames. ], batch size: 91, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:10:07,686 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 03:10:12,017 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 03:10:13,875 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 03:10:56,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2452030.0, ans=0.125 2024-08-14 03:10:58,027 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 03:11:04,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2452030.0, ans=0.1 2024-08-14 03:11:15,674 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 03:11:22,522 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.110e-02 2024-08-14 03:11:26,829 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13350, loss[loss=0.1019, beats_loss=0.01114, ecapa_loss=0.00015, whisper_loss=0.08922, over 19392.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001586, whisper_loss=0.09178, over 3840998.47 frames. ], batch size: 78, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:11:39,059 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 03:11:52,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2452330.0, ans=0.0 2024-08-14 03:11:58,579 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 03:12:12,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.377e+01 2.695e+01 3.024e+01 3.722e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-14 03:12:14,706 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 03:12:15,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-14 03:12:16,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2452530.0, ans=0.125 2024-08-14 03:12:17,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2452530.0, ans=0.0 2024-08-14 03:12:23,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2452530.0, ans=0.1 2024-08-14 03:12:23,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2024-08-14 03:12:24,320 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 03:12:41,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2452630.0, ans=0.125 2024-08-14 03:12:44,709 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 03:12:48,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13400, loss[loss=0.1049, beats_loss=0.01258, ecapa_loss=0.0001264, whisper_loss=0.09105, over 17621.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001575, whisper_loss=0.09165, over 3840802.05 frames. ], batch size: 69, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:12:54,397 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-14 03:12:55,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2452730.0, ans=0.0 2024-08-14 03:13:00,755 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 03:13:22,438 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 03:13:24,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-14 03:13:27,972 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 03:13:39,134 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06988123059272766, model_norm_threshold=53.90802001953125 2024-08-14 03:13:39,327 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.512e+05, grad_sumsq=1.512e+05, orig_rms_sq=1.000e+00 2024-08-14 03:13:41,642 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-14 03:13:49,777 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 03:13:51,419 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 03:13:57,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2453130.0, ans=0.0 2024-08-14 03:14:05,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2453130.0, ans=0.125 2024-08-14 03:14:08,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2453230.0, ans=0.125 2024-08-14 03:14:09,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13450, loss[loss=0.1247, beats_loss=0.01053, ecapa_loss=0.0001266, whisper_loss=0.1129, over 15708.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001581, whisper_loss=0.09175, over 3851331.59 frames. ], batch size: 57, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:14:26,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2453330.0, ans=0.07 2024-08-14 03:14:50,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2024-08-14 03:14:55,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.489e+01 2.722e+01 3.204e+01 7.714e+02, threshold=5.444e+01, percent-clipped=1.0 2024-08-14 03:14:55,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2453430.0, ans=0.0 2024-08-14 03:14:58,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2453530.0, ans=0.125 2024-08-14 03:15:15,254 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 03:15:23,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2453630.0, ans=0.0 2024-08-14 03:15:27,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13500, loss[loss=0.09879, beats_loss=0.009386, ecapa_loss=0.0001973, whisper_loss=0.08743, over 19717.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.00016, whisper_loss=0.09224, over 3848814.03 frames. ], batch size: 82, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:15:37,117 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 03:15:46,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2453830.0, ans=0.0 2024-08-14 03:16:18,094 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 03:16:23,326 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 03:16:30,162 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 03:16:31,630 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 03:16:34,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2454130.0, ans=0.125 2024-08-14 03:16:36,640 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13550, loss[loss=0.1001, beats_loss=0.01028, ecapa_loss=0.0001862, whisper_loss=0.08797, over 17601.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001596, whisper_loss=0.09217, over 3893684.12 frames. ], batch size: 75, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:16:45,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2454230.0, ans=0.125 2024-08-14 03:17:03,851 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 03:17:11,612 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 03:17:12,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.332e+01 2.621e+01 2.776e+01 5.086e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-14 03:17:16,619 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 03:17:22,964 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 03:17:28,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2454630.0, ans=0.2 2024-08-14 03:17:29,362 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 03:17:29,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2454630.0, ans=0.125 2024-08-14 03:17:36,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2454630.0, ans=0.0 2024-08-14 03:17:41,329 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13600, loss[loss=0.1092, beats_loss=0.009491, ecapa_loss=0.0001625, whisper_loss=0.09812, over 23528.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01069, ecapa_loss=0.0001588, whisper_loss=0.09268, over 3913765.12 frames. ], batch size: 93, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:17:48,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2454730.0, ans=0.125 2024-08-14 03:18:06,532 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 03:18:09,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2454930.0, ans=0.125 2024-08-14 03:18:12,808 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 03:18:46,910 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13650, loss[loss=0.1063, beats_loss=0.01199, ecapa_loss=0.0001373, whisper_loss=0.09298, over 22605.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001601, whisper_loss=0.0922, over 3898387.51 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:18:57,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2455230.0, ans=0.0 2024-08-14 03:19:01,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2455330.0, ans=0.0 2024-08-14 03:19:12,966 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 03:19:24,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.360e+01 2.649e+01 3.081e+01 1.605e+02, threshold=5.298e+01, percent-clipped=1.0 2024-08-14 03:19:57,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13700, loss[loss=0.09189, beats_loss=0.01242, ecapa_loss=0.0001698, whisper_loss=0.07777, over 18966.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001603, whisper_loss=0.09189, over 3892483.31 frames. ], batch size: 78, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:20:13,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2455830.0, ans=0.125 2024-08-14 03:20:14,816 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 03:20:24,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2455830.0, ans=0.125 2024-08-14 03:20:38,403 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 03:20:49,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2456030.0, ans=0.0 2024-08-14 03:20:56,329 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 03:21:00,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2456130.0, ans=0.125 2024-08-14 03:21:10,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13750, loss[loss=0.1116, beats_loss=0.01003, ecapa_loss=0.0001498, whisper_loss=0.1001, over 23336.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001587, whisper_loss=0.09202, over 3894900.33 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:21:19,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2456230.0, ans=0.125 2024-08-14 03:21:20,449 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 03:21:30,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-08-14 03:21:40,972 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 03:21:44,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2456430.0, ans=0.125 2024-08-14 03:21:52,278 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 03:21:54,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.289e+01 2.530e+01 2.894e+01 7.886e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-14 03:22:07,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2456530.0, ans=0.125 2024-08-14 03:22:13,422 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 03:22:13,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2456630.0, ans=0.125 2024-08-14 03:22:15,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2456630.0, ans=0.1 2024-08-14 03:22:27,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2456730.0, ans=0.1 2024-08-14 03:22:28,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13800, loss[loss=0.09822, beats_loss=0.007829, ecapa_loss=0.0001461, whisper_loss=0.08893, over 17789.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001576, whisper_loss=0.09195, over 3885578.13 frames. ], batch size: 65, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:22:51,264 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 03:23:22,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-14 03:23:32,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2457130.0, ans=0.0 2024-08-14 03:23:48,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13850, loss[loss=0.1056, beats_loss=0.01196, ecapa_loss=0.0001358, whisper_loss=0.09224, over 16133.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001583, whisper_loss=0.09118, over 3910026.17 frames. ], batch size: 64, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:24:01,501 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 03:24:02,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2457230.0, ans=0.0 2024-08-14 03:24:06,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2457330.0, ans=0.125 2024-08-14 03:24:24,223 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 03:24:32,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2457430.0, ans=0.1 2024-08-14 03:24:35,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.485e+01 2.798e+01 3.130e+01 4.713e+02, threshold=5.595e+01, percent-clipped=2.0 2024-08-14 03:24:41,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-14 03:24:45,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-14 03:24:46,727 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.370e+05 2024-08-14 03:24:52,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2024-08-14 03:24:58,092 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 03:24:59,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2457630.0, ans=0.2 2024-08-14 03:25:11,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13900, loss[loss=0.09061, beats_loss=0.01634, ecapa_loss=0.0001331, whisper_loss=0.07294, over 17469.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001569, whisper_loss=0.09164, over 3911008.36 frames. ], batch size: 70, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:25:16,573 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 03:25:20,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2457730.0, ans=0.1 2024-08-14 03:25:29,572 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 03:25:39,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2457830.0, ans=0.125 2024-08-14 03:25:40,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2457830.0, ans=0.125 2024-08-14 03:25:44,274 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 03:25:45,882 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 03:25:52,111 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 03:25:57,000 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 03:26:03,848 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-14 03:26:21,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2458130.0, ans=0.125 2024-08-14 03:26:31,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-14 03:26:34,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 13950, loss[loss=0.1132, beats_loss=0.007533, ecapa_loss=0.0001367, whisper_loss=0.1043, over 15882.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001575, whisper_loss=0.09102, over 3907058.86 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:26:42,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2458230.0, ans=0.0 2024-08-14 03:26:52,254 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 03:26:57,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2458330.0, ans=0.0 2024-08-14 03:26:59,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2458330.0, ans=0.0 2024-08-14 03:27:03,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2458330.0, ans=0.125 2024-08-14 03:27:19,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.317e+01 2.641e+01 2.864e+01 9.900e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-14 03:27:52,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14000, loss[loss=0.1048, beats_loss=0.01031, ecapa_loss=0.0001674, whisper_loss=0.09282, over 21986.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.000157, whisper_loss=0.0916, over 3907848.45 frames. ], batch size: 89, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:27:52,941 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 03:28:25,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2458930.0, ans=0.125 2024-08-14 03:28:31,626 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 03:28:48,951 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 03:29:05,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2459130.0, ans=0.125 2024-08-14 03:29:06,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2459130.0, ans=0.125 2024-08-14 03:29:12,397 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14050, loss[loss=0.1211, beats_loss=0.009411, ecapa_loss=0.0001991, whisper_loss=0.1097, over 22696.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001564, whisper_loss=0.09168, over 3904649.41 frames. ], batch size: 92, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:29:22,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2459230.0, ans=0.125 2024-08-14 03:29:34,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2459330.0, ans=0.0 2024-08-14 03:29:44,086 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 35 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 03:29:50,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-14 03:29:55,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2459430.0, ans=0.0 2024-08-14 03:29:58,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.432e+01 2.589e+01 2.887e+01 3.706e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 03:30:07,551 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 03:30:08,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2459530.0, ans=0.2 2024-08-14 03:30:29,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2459630.0, ans=0.5 2024-08-14 03:30:31,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14100, loss[loss=0.09974, beats_loss=0.009931, ecapa_loss=0.0001876, whisper_loss=0.08793, over 21474.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001578, whisper_loss=0.09242, over 3890632.93 frames. ], batch size: 92, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:30:32,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2459730.0, ans=0.0 2024-08-14 03:30:43,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-14 03:30:52,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-08-14 03:30:59,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2459830.0, ans=0.125 2024-08-14 03:31:21,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2460030.0, ans=0.0 2024-08-14 03:31:29,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2460030.0, ans=0.125 2024-08-14 03:31:45,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2460130.0, ans=0.125 2024-08-14 03:31:47,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2460130.0, ans=0.0 2024-08-14 03:31:51,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2460230.0, ans=0.0 2024-08-14 03:31:52,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14150, loss[loss=0.1293, beats_loss=0.009579, ecapa_loss=0.0001343, whisper_loss=0.1184, over 24310.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001572, whisper_loss=0.0923, over 3869439.10 frames. ], batch size: 92, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:31:54,670 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 03:32:14,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-14 03:32:19,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2460330.0, ans=0.125 2024-08-14 03:32:35,961 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 03:32:39,061 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 03:32:40,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.331e+01 2.560e+01 2.927e+01 4.747e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 03:32:41,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2024-08-14 03:33:01,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2460630.0, ans=0.0 2024-08-14 03:33:16,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14200, loss[loss=0.1013, beats_loss=0.01178, ecapa_loss=0.000143, whisper_loss=0.08811, over 15822.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.000157, whisper_loss=0.09192, over 3871970.48 frames. ], batch size: 62, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:33:55,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2460930.0, ans=0.5 2024-08-14 03:34:04,333 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 03:34:08,264 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 03:34:30,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2461130.0, ans=0.0 2024-08-14 03:34:33,933 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 03:34:34,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2461130.0, ans=0.125 2024-08-14 03:34:39,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2461230.0, ans=0.0 2024-08-14 03:34:40,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14250, loss[loss=0.1094, beats_loss=0.009671, ecapa_loss=0.0001323, whisper_loss=0.09837, over 16526.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01066, ecapa_loss=0.0001572, whisper_loss=0.09197, over 3886394.71 frames. ], batch size: 59, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:34:51,146 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 03:34:51,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2461230.0, ans=0.125 2024-08-14 03:34:54,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2461330.0, ans=0.0 2024-08-14 03:35:02,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2461330.0, ans=0.1 2024-08-14 03:35:04,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=15.0 2024-08-14 03:35:18,002 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 03:35:25,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.288e+01 2.518e+01 2.897e+01 5.060e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-14 03:35:40,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2461530.0, ans=0.1 2024-08-14 03:35:59,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14300, loss[loss=0.09298, beats_loss=0.01285, ecapa_loss=0.0001398, whisper_loss=0.07873, over 21969.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001567, whisper_loss=0.09145, over 3860153.98 frames. ], batch size: 90, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:36:06,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2461730.0, ans=0.07 2024-08-14 03:36:06,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2461730.0, ans=0.1 2024-08-14 03:36:11,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2461730.0, ans=0.125 2024-08-14 03:36:20,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2461830.0, ans=0.2 2024-08-14 03:36:54,543 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 03:37:02,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2462130.0, ans=0.125 2024-08-14 03:37:06,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2462130.0, ans=0.125 2024-08-14 03:37:06,940 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:37:13,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-14 03:37:18,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14350, loss[loss=0.09831, beats_loss=0.008673, ecapa_loss=0.0001851, whisper_loss=0.08779, over 21028.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001572, whisper_loss=0.09105, over 3844327.21 frames. ], batch size: 90, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:37:39,534 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 03:37:44,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2462330.0, ans=0.125 2024-08-14 03:37:48,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2462330.0, ans=0.1 2024-08-14 03:37:50,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2462430.0, ans=0.125 2024-08-14 03:37:50,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2462430.0, ans=0.1 2024-08-14 03:37:55,472 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-14 03:37:56,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2462430.0, ans=0.125 2024-08-14 03:37:57,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2462430.0, ans=0.125 2024-08-14 03:38:01,249 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 03:38:03,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.518e+01 2.731e+01 3.066e+01 7.073e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-14 03:38:20,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2462630.0, ans=0.125 2024-08-14 03:38:24,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2462630.0, ans=0.05 2024-08-14 03:38:31,266 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:38:36,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14400, loss[loss=0.0982, beats_loss=0.01093, ecapa_loss=0.0001647, whisper_loss=0.08562, over 22113.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01066, ecapa_loss=0.0001588, whisper_loss=0.09133, over 3884415.29 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:38:45,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2462730.0, ans=0.125 2024-08-14 03:38:56,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2462830.0, ans=0.125 2024-08-14 03:39:02,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2462830.0, ans=0.125 2024-08-14 03:39:19,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2462930.0, ans=0.125 2024-08-14 03:39:25,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2463030.0, ans=0.125 2024-08-14 03:39:37,138 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 37 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 03:39:53,640 INFO [train_multi_KD3.py:1116] (2/4) Epoch 17, batch 14450, loss[loss=0.09611, beats_loss=0.01303, ecapa_loss=0.0001229, whisper_loss=0.08186, over 17904.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.00016, whisper_loss=0.092, over 3898108.82 frames. ], batch size: 68, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:39:55,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2463230.0, ans=0.1 2024-08-14 03:39:57,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2463230.0, ans=0.5 2024-08-14 03:40:07,003 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 03:40:08,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-14 03:40:14,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2463330.0, ans=0.125 2024-08-14 03:40:33,957 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 03:40:38,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2463430.0, ans=0.125 2024-08-14 03:40:38,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2463430.0, ans=0.0 2024-08-14 03:40:40,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.412e+01 2.642e+01 2.928e+01 4.301e+01, threshold=5.284e+01, percent-clipped=0.0 2024-08-14 03:40:42,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-14 03:40:50,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2463530.0, ans=0.125 2024-08-14 03:40:54,158 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 03:41:05,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2463630.0, ans=0.0 2024-08-14 03:41:52,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 0, loss[loss=0.122, beats_loss=0.00892, ecapa_loss=0.0001646, whisper_loss=0.1114, over 23505.00 frames. ], tot_loss[loss=0.122, beats_loss=0.00892, ecapa_loss=0.0001646, whisper_loss=0.1114, over 23505.00 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:41:52,894 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 03:42:32,729 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005528, whisper_loss=0.2483, over 922467.00 frames. 2024-08-14 03:42:48,648 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on SV_voxceleb1: loss=0.004396, beats_loss=0, ecapa_loss=0.0004396, whisper_loss=0, over 939242.00 frames. 2024-08-14 03:44:37,241 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 03:44:37,244 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 03:44:50,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-14 03:44:53,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2463720.0, ans=0.125 2024-08-14 03:45:01,468 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 03:45:06,571 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 03:45:31,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=12.0 2024-08-14 03:45:33,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2463920.0, ans=0.125 2024-08-14 03:45:51,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2464020.0, ans=0.1 2024-08-14 03:45:56,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2464020.0, ans=0.0 2024-08-14 03:46:40,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 50, loss[loss=0.1044, beats_loss=0.009375, ecapa_loss=0.000151, whisper_loss=0.09356, over 23327.00 frames. ], tot_loss[loss=0.102, beats_loss=0.009844, ecapa_loss=0.0001648, whisper_loss=0.0905, over 890621.35 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:47:47,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.625e+01 2.934e+01 3.274e+01 1.725e+02, threshold=5.869e+01, percent-clipped=1.0 2024-08-14 03:47:49,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2464520.0, ans=0.0 2024-08-14 03:47:50,866 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 03:47:53,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2464520.0, ans=0.125 2024-08-14 03:47:59,856 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 03:48:12,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2464620.0, ans=0.125 2024-08-14 03:48:29,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2464620.0, ans=0.125 2024-08-14 03:48:31,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 100, loss[loss=0.1029, beats_loss=0.009865, ecapa_loss=0.0001752, whisper_loss=0.09127, over 16513.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009816, ecapa_loss=0.0001629, whisper_loss=0.08915, over 1518158.55 frames. ], batch size: 66, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:48:37,174 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 03:49:03,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-08-14 03:49:07,308 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-14 03:49:40,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2465020.0, ans=0.1 2024-08-14 03:49:57,229 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 03:50:08,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2465120.0, ans=0.0 2024-08-14 03:50:14,155 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 150, loss[loss=0.117, beats_loss=0.009193, ecapa_loss=0.0001512, whisper_loss=0.1063, over 22640.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009924, ecapa_loss=0.0001624, whisper_loss=0.08889, over 2030456.30 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:50:23,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2465220.0, ans=0.1 2024-08-14 03:50:38,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2465320.0, ans=0.07 2024-08-14 03:50:43,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2465320.0, ans=0.2 2024-08-14 03:50:46,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2465420.0, ans=0.025 2024-08-14 03:50:54,137 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 03:50:54,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2465420.0, ans=0.0 2024-08-14 03:51:02,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.708e+01 3.001e+01 3.363e+01 1.526e+02, threshold=6.002e+01, percent-clipped=2.0 2024-08-14 03:51:15,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.09 vs. limit=22.5 2024-08-14 03:51:25,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2465620.0, ans=0.0 2024-08-14 03:51:25,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-14 03:51:33,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 200, loss[loss=0.1059, beats_loss=0.0124, ecapa_loss=0.0001513, whisper_loss=0.09201, over 17508.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01007, ecapa_loss=0.0001619, whisper_loss=0.08943, over 2398715.51 frames. ], batch size: 69, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:51:37,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2465720.0, ans=0.015 2024-08-14 03:51:40,199 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 03:51:41,406 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 03:51:59,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2465820.0, ans=0.05 2024-08-14 03:52:19,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2466020.0, ans=0.125 2024-08-14 03:52:19,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2466020.0, ans=0.1 2024-08-14 03:52:30,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2466020.0, ans=0.0 2024-08-14 03:52:55,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 250, loss[loss=0.1081, beats_loss=0.01073, ecapa_loss=0.0001677, whisper_loss=0.09571, over 22375.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01026, ecapa_loss=0.0001608, whisper_loss=0.08935, over 2713881.36 frames. ], batch size: 91, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:53:02,534 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 03:53:04,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2466220.0, ans=0.2 2024-08-14 03:53:08,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2466220.0, ans=0.1 2024-08-14 03:53:31,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2466420.0, ans=0.125 2024-08-14 03:53:36,013 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 03:53:46,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.439e+01 2.692e+01 3.141e+01 8.859e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 03:53:50,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2466520.0, ans=0.04949747468305833 2024-08-14 03:54:09,555 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 03:54:14,553 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 03:54:15,899 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 03:54:19,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 300, loss[loss=0.1274, beats_loss=0.007806, ecapa_loss=0.0001775, whisper_loss=0.1178, over 19960.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001602, whisper_loss=0.08905, over 2925524.58 frames. ], batch size: 76, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:54:24,659 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 03:54:34,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2466720.0, ans=0.125 2024-08-14 03:54:42,994 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:54:51,098 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:55:06,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2466920.0, ans=0.1 2024-08-14 03:55:08,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2467020.0, ans=0.0 2024-08-14 03:55:11,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2467020.0, ans=0.0 2024-08-14 03:55:12,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2467020.0, ans=0.0 2024-08-14 03:55:19,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-14 03:55:26,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2467120.0, ans=0.0 2024-08-14 03:55:39,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 350, loss[loss=0.1112, beats_loss=0.01104, ecapa_loss=0.0001519, whisper_loss=0.09868, over 19848.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.000162, whisper_loss=0.08944, over 3118088.68 frames. ], batch size: 78, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:55:50,056 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=8.0 2024-08-14 03:55:57,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-08-14 03:56:15,466 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 03:56:25,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.373e+01 2.538e+01 2.756e+01 1.193e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-14 03:56:30,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2467520.0, ans=0.2 2024-08-14 03:56:42,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2467620.0, ans=0.125 2024-08-14 03:56:43,622 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 03:56:55,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 400, loss[loss=0.09468, beats_loss=0.01135, ecapa_loss=0.0001464, whisper_loss=0.08187, over 16688.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001604, whisper_loss=0.08938, over 3283494.24 frames. ], batch size: 64, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:56:57,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2467720.0, ans=0.125 2024-08-14 03:57:19,225 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 03:57:27,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-14 03:57:51,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2468020.0, ans=0.0 2024-08-14 03:57:58,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2024-08-14 03:58:03,214 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.267e-01 2024-08-14 03:58:09,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2468120.0, ans=0.0 2024-08-14 03:58:11,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 450, loss[loss=0.1162, beats_loss=0.007253, ecapa_loss=0.0001846, whisper_loss=0.1071, over 18299.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001597, whisper_loss=0.08913, over 3419007.67 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:58:28,091 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 03:58:34,782 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:58:37,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2468320.0, ans=0.125 2024-08-14 03:58:41,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-14 03:58:50,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=22.5 2024-08-14 03:58:54,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2468420.0, ans=0.0 2024-08-14 03:58:56,276 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 03:58:57,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.249e+01 2.491e+01 2.829e+01 3.988e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-14 03:58:58,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2468520.0, ans=0.09899494936611666 2024-08-14 03:59:09,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2468520.0, ans=0.125 2024-08-14 03:59:28,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 500, loss[loss=0.1048, beats_loss=0.01133, ecapa_loss=0.0001625, whisper_loss=0.0918, over 17927.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001588, whisper_loss=0.08979, over 3529811.79 frames. ], batch size: 73, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:59:31,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2468720.0, ans=0.125 2024-08-14 03:59:42,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2468720.0, ans=0.0 2024-08-14 03:59:48,034 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 03:59:49,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2468820.0, ans=0.0 2024-08-14 03:59:54,538 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 03:59:57,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2468820.0, ans=0.2 2024-08-14 04:00:00,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-14 04:00:01,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2468920.0, ans=0.1 2024-08-14 04:00:19,015 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:00:21,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2469020.0, ans=0.0 2024-08-14 04:00:32,152 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 04:00:40,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2469120.0, ans=0.125 2024-08-14 04:00:45,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 550, loss[loss=0.08594, beats_loss=0.01416, ecapa_loss=0.0001319, whisper_loss=0.07046, over 22311.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001588, whisper_loss=0.09028, over 3626812.20 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:00:46,078 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 04:01:17,725 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 04:01:21,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2469420.0, ans=0.1 2024-08-14 04:01:25,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2469420.0, ans=0.2 2024-08-14 04:01:32,138 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.764e+01 3.145e+01 1.301e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-14 04:01:33,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-08-14 04:02:01,572 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 600, loss[loss=0.1147, beats_loss=0.008445, ecapa_loss=0.0001914, whisper_loss=0.1043, over 17492.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001594, whisper_loss=0.09062, over 3695390.98 frames. ], batch size: 69, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:02:03,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2469720.0, ans=0.125 2024-08-14 04:02:12,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-14 04:02:19,286 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 04:02:22,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2024-08-14 04:02:52,008 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 04:02:54,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2470020.0, ans=0.0 2024-08-14 04:02:58,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2470020.0, ans=0.0 2024-08-14 04:03:03,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-14 04:03:04,027 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 04:03:06,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2470120.0, ans=0.95 2024-08-14 04:03:15,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 650, loss[loss=0.09975, beats_loss=0.009995, ecapa_loss=0.000126, whisper_loss=0.0885, over 18606.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001591, whisper_loss=0.09067, over 3749128.84 frames. ], batch size: 70, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:03:22,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2024-08-14 04:04:02,211 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.415e+01 2.559e+01 3.017e+01 4.730e+01, threshold=5.119e+01, percent-clipped=1.0 2024-08-14 04:04:22,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2470620.0, ans=0.125 2024-08-14 04:04:32,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 700, loss[loss=0.1454, beats_loss=0.008142, ecapa_loss=0.0001364, whisper_loss=0.1359, over 16313.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001588, whisper_loss=0.09029, over 3729238.96 frames. ], batch size: 57, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:04:49,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2470820.0, ans=0.125 2024-08-14 04:04:50,874 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 04:04:53,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2470820.0, ans=0.2 2024-08-14 04:05:07,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-14 04:05:27,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.28 vs. limit=10.0 2024-08-14 04:05:31,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2471120.0, ans=0.1 2024-08-14 04:05:33,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2471120.0, ans=0.0 2024-08-14 04:05:47,329 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 750, loss[loss=0.1078, beats_loss=0.009818, ecapa_loss=0.0001553, whisper_loss=0.09643, over 20976.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001579, whisper_loss=0.09018, over 3737515.43 frames. ], batch size: 82, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:06:00,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2471220.0, ans=0.0 2024-08-14 04:06:07,131 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 04:06:08,579 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 28 from LS+wenet, 11 from Vox, 17 fro AS 2024-08-14 04:06:19,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2471420.0, ans=0.0 2024-08-14 04:06:28,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2471420.0, ans=0.125 2024-08-14 04:06:31,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.490e+01 2.820e+01 4.318e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-14 04:06:48,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-08-14 04:06:53,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2471620.0, ans=0.0 2024-08-14 04:06:55,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2471620.0, ans=0.1 2024-08-14 04:07:01,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 800, loss[loss=0.08185, beats_loss=0.0106, ecapa_loss=0.0001355, whisper_loss=0.0699, over 16501.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001566, whisper_loss=0.09019, over 3771899.10 frames. ], batch size: 60, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:07:02,157 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 04:07:07,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.26 vs. limit=10.0 2024-08-14 04:07:15,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2471720.0, ans=0.1 2024-08-14 04:07:33,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=12.0 2024-08-14 04:07:39,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2024-08-14 04:07:42,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2471920.0, ans=15.0 2024-08-14 04:07:48,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2472020.0, ans=0.07 2024-08-14 04:07:49,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2472020.0, ans=0.125 2024-08-14 04:07:56,714 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 04:08:14,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2024-08-14 04:08:17,669 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 850, loss[loss=0.1076, beats_loss=0.007905, ecapa_loss=0.0001544, whisper_loss=0.09817, over 15280.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001564, whisper_loss=0.09003, over 3764569.55 frames. ], batch size: 58, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:08:23,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2024-08-14 04:08:32,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2472320.0, ans=0.0 2024-08-14 04:08:48,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2472420.0, ans=0.0 2024-08-14 04:08:50,998 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 04:09:01,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.448e+01 2.671e+01 3.055e+01 4.887e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-14 04:09:01,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2472520.0, ans=0.125 2024-08-14 04:09:30,142 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 04:09:30,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2472620.0, ans=0.2 2024-08-14 04:09:33,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 900, loss[loss=0.1069, beats_loss=0.01152, ecapa_loss=0.0001354, whisper_loss=0.094, over 21050.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.000155, whisper_loss=0.08933, over 3791113.60 frames. ], batch size: 83, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:09:33,338 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 04:09:47,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2472820.0, ans=0.125 2024-08-14 04:09:50,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2472820.0, ans=0.0 2024-08-14 04:10:03,508 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 04:10:05,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-08-14 04:10:19,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2473020.0, ans=0.0 2024-08-14 04:10:25,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-14 04:10:31,247 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 04:10:32,966 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 04:10:39,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2473120.0, ans=0.05 2024-08-14 04:10:42,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2473120.0, ans=0.125 2024-08-14 04:10:48,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2473120.0, ans=0.125 2024-08-14 04:10:50,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 950, loss[loss=0.09766, beats_loss=0.008694, ecapa_loss=0.0001728, whisper_loss=0.08723, over 15738.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001542, whisper_loss=0.08898, over 3788745.34 frames. ], batch size: 61, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:10:51,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2473220.0, ans=0.125 2024-08-14 04:10:57,270 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 04:11:06,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2473320.0, ans=0.1 2024-08-14 04:11:14,890 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 04:11:35,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.279e+01 2.588e+01 3.016e+01 4.728e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 04:11:54,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2473620.0, ans=0.07 2024-08-14 04:12:04,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1000, loss[loss=0.107, beats_loss=0.01074, ecapa_loss=0.0001176, whisper_loss=0.09508, over 19101.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01058, ecapa_loss=0.0001545, whisper_loss=0.08841, over 3786347.52 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:12:16,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=15.0 2024-08-14 04:12:19,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2473820.0, ans=0.125 2024-08-14 04:13:07,917 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 04:13:08,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-14 04:13:21,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1050, loss[loss=0.1006, beats_loss=0.01305, ecapa_loss=0.0001384, whisper_loss=0.08614, over 22310.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001543, whisper_loss=0.08903, over 3804660.32 frames. ], batch size: 88, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:13:37,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=12.0 2024-08-14 04:13:52,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2474420.0, ans=0.2 2024-08-14 04:13:53,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2474420.0, ans=0.0 2024-08-14 04:13:59,366 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.723e-02 2024-08-14 04:14:08,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.402e+01 2.807e+01 3.075e+01 7.896e+01, threshold=5.614e+01, percent-clipped=1.0 2024-08-14 04:14:21,878 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 04:14:22,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2474620.0, ans=0.2 2024-08-14 04:14:30,231 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.54 vs. limit=10.0 2024-08-14 04:14:37,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2474720.0, ans=0.125 2024-08-14 04:14:38,492 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1100, loss[loss=0.09889, beats_loss=0.01085, ecapa_loss=0.0001753, whisper_loss=0.08629, over 16803.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001554, whisper_loss=0.08917, over 3835412.93 frames. ], batch size: 68, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:14:59,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2474820.0, ans=0.0 2024-08-14 04:14:59,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2474820.0, ans=0.125 2024-08-14 04:15:07,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2474920.0, ans=0.125 2024-08-14 04:15:19,005 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 04:15:29,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2475020.0, ans=0.2 2024-08-14 04:15:37,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2475120.0, ans=0.2 2024-08-14 04:15:51,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2475220.0, ans=0.0 2024-08-14 04:15:52,709 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1150, loss[loss=0.1159, beats_loss=0.01077, ecapa_loss=0.0001802, whisper_loss=0.1033, over 18923.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.0001554, whisper_loss=0.08954, over 3815876.07 frames. ], batch size: 75, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:15:59,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2475220.0, ans=10.0 2024-08-14 04:16:05,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2475220.0, ans=0.0 2024-08-14 04:16:38,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.338e+01 2.593e+01 2.937e+01 5.602e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 04:16:43,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2475520.0, ans=0.0 2024-08-14 04:16:54,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2475620.0, ans=0.1 2024-08-14 04:17:01,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2475620.0, ans=0.125 2024-08-14 04:17:04,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2475620.0, ans=0.125 2024-08-14 04:17:07,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1200, loss[loss=0.1024, beats_loss=0.01098, ecapa_loss=0.0001656, whisper_loss=0.08976, over 22075.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0107, ecapa_loss=0.0001558, whisper_loss=0.08948, over 3826721.26 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:17:14,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-14 04:17:54,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2476020.0, ans=0.1 2024-08-14 04:17:58,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-14 04:18:08,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2476120.0, ans=0.125 2024-08-14 04:18:13,834 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 04:18:15,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2024-08-14 04:18:21,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1250, loss[loss=0.08346, beats_loss=0.0126, ecapa_loss=0.0001422, whisper_loss=0.06943, over 22165.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01075, ecapa_loss=0.0001546, whisper_loss=0.08912, over 3845407.33 frames. ], batch size: 89, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:18:23,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2476220.0, ans=0.1 2024-08-14 04:18:25,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2476220.0, ans=0.125 2024-08-14 04:18:33,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2024-08-14 04:18:45,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2476320.0, ans=0.0 2024-08-14 04:18:45,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2476320.0, ans=0.125 2024-08-14 04:18:46,639 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 04:19:02,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2476420.0, ans=0.0 2024-08-14 04:19:07,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.365e+01 2.557e+01 2.889e+01 4.348e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 04:19:08,941 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 04:19:13,889 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 04:19:18,335 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-14 04:19:29,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2476620.0, ans=0.125 2024-08-14 04:19:30,319 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 04:19:35,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2476620.0, ans=0.0 2024-08-14 04:19:38,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1300, loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001499, whisper_loss=0.08874, over 22245.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01073, ecapa_loss=0.0001545, whisper_loss=0.08904, over 3857917.58 frames. ], batch size: 88, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:19:45,247 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 04:19:57,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2476820.0, ans=0.025 2024-08-14 04:20:05,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2476820.0, ans=0.1 2024-08-14 04:20:29,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2477020.0, ans=0.125 2024-08-14 04:20:29,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2477020.0, ans=0.2 2024-08-14 04:20:52,933 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 04:20:55,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1350, loss[loss=0.1076, beats_loss=0.01146, ecapa_loss=0.0001808, whisper_loss=0.09432, over 21699.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001538, whisper_loss=0.08974, over 3852189.66 frames. ], batch size: 92, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:21:02,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2477220.0, ans=0.05 2024-08-14 04:21:29,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2477420.0, ans=0.125 2024-08-14 04:21:32,450 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 04:21:32,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2477420.0, ans=0.1 2024-08-14 04:21:35,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2477420.0, ans=0.125 2024-08-14 04:21:41,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.297e+01 2.516e+01 2.764e+01 4.025e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 04:21:43,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2477520.0, ans=0.2 2024-08-14 04:21:44,704 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 04:22:11,518 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1400, loss[loss=0.1212, beats_loss=0.007948, ecapa_loss=0.0001649, whisper_loss=0.1116, over 18908.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001528, whisper_loss=0.08988, over 3858326.60 frames. ], batch size: 72, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:22:13,554 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 04:22:16,460 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 04:22:26,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-14 04:22:27,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-14 04:22:45,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2477920.0, ans=0.0 2024-08-14 04:22:55,918 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 04:23:04,842 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 04:23:14,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2478120.0, ans=0.2 2024-08-14 04:23:20,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-08-14 04:24:06,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1450, loss[loss=0.1169, beats_loss=0.008114, ecapa_loss=0.0001779, whisper_loss=0.107, over 20704.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001521, whisper_loss=0.08955, over 3839568.55 frames. ], batch size: 80, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:24:11,704 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 04:24:24,556 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.356e+01 2024-08-14 04:24:27,872 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:24:39,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2478420.0, ans=0.1 2024-08-14 04:24:41,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-14 04:24:54,409 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 04:24:55,520 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.310e+01 2.554e+01 2.920e+01 4.164e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-14 04:24:59,930 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:25:03,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2024-08-14 04:25:04,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-08-14 04:25:14,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-14 04:25:29,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1500, loss[loss=0.1029, beats_loss=0.009799, ecapa_loss=0.0001428, whisper_loss=0.09163, over 22191.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01074, ecapa_loss=0.000152, whisper_loss=0.08931, over 3860722.80 frames. ], batch size: 86, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:25:29,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2024-08-14 04:25:35,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=15.0 2024-08-14 04:25:44,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2478820.0, ans=0.125 2024-08-14 04:26:25,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2479020.0, ans=0.09899494936611666 2024-08-14 04:26:28,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2479020.0, ans=0.125 2024-08-14 04:26:30,814 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 04:26:31,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2479020.0, ans=0.125 2024-08-14 04:26:32,399 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 04:26:38,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2479120.0, ans=0.125 2024-08-14 04:26:38,417 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:26:41,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2479120.0, ans=0.0 2024-08-14 04:26:50,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1550, loss[loss=0.1115, beats_loss=0.008371, ecapa_loss=0.0001657, whisper_loss=0.1014, over 16978.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01076, ecapa_loss=0.0001514, whisper_loss=0.08878, over 3849666.05 frames. ], batch size: 65, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:27:01,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2479220.0, ans=0.0 2024-08-14 04:27:29,384 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 04:27:29,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2479420.0, ans=0.125 2024-08-14 04:27:39,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.208e+01 2.513e+01 2.710e+01 4.785e+01, threshold=5.026e+01, percent-clipped=0.0 2024-08-14 04:27:54,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2479620.0, ans=0.0 2024-08-14 04:28:03,333 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 04:28:05,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2479620.0, ans=0.1 2024-08-14 04:28:10,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2479720.0, ans=0.0 2024-08-14 04:28:10,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-14 04:28:11,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1600, loss[loss=0.09467, beats_loss=0.008556, ecapa_loss=0.0001888, whisper_loss=0.08423, over 16118.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01075, ecapa_loss=0.000151, whisper_loss=0.08813, over 3806062.21 frames. ], batch size: 68, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:28:28,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2479820.0, ans=0.0 2024-08-14 04:28:37,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2479820.0, ans=0.125 2024-08-14 04:29:03,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-14 04:29:09,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2480020.0, ans=0.125 2024-08-14 04:29:13,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-14 04:29:14,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2480020.0, ans=0.125 2024-08-14 04:29:17,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2480120.0, ans=0.0 2024-08-14 04:29:19,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2480120.0, ans=10.0 2024-08-14 04:29:19,645 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 04:29:23,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2480120.0, ans=0.0 2024-08-14 04:29:31,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1650, loss[loss=0.1187, beats_loss=0.01098, ecapa_loss=0.0001468, whisper_loss=0.1063, over 21565.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01077, ecapa_loss=0.0001514, whisper_loss=0.08826, over 3832933.75 frames. ], batch size: 88, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:29:45,636 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 04:29:48,477 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 04:29:48,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2480320.0, ans=0.0 2024-08-14 04:29:54,162 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 04:29:55,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2480320.0, ans=0.0 2024-08-14 04:29:57,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=15.0 2024-08-14 04:30:01,623 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 04:30:13,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2480420.0, ans=0.125 2024-08-14 04:30:17,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.355e+01 2.575e+01 2.902e+01 4.492e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 04:30:17,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2480520.0, ans=0.0 2024-08-14 04:30:23,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-14 04:30:26,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2480520.0, ans=0.015 2024-08-14 04:30:29,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2480520.0, ans=0.2 2024-08-14 04:30:38,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-08-14 04:30:39,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2480620.0, ans=0.125 2024-08-14 04:30:46,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1700, loss[loss=0.08553, beats_loss=0.01269, ecapa_loss=0.0001528, whisper_loss=0.07131, over 16437.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01077, ecapa_loss=0.0001516, whisper_loss=0.08867, over 3804190.26 frames. ], batch size: 64, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:31:05,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2480820.0, ans=0.125 2024-08-14 04:31:08,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2480820.0, ans=0.125 2024-08-14 04:31:33,880 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-14 04:32:00,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1750, loss[loss=0.1139, beats_loss=0.01078, ecapa_loss=0.0001749, whisper_loss=0.1014, over 22754.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01076, ecapa_loss=0.0001511, whisper_loss=0.08842, over 3825791.11 frames. ], batch size: 93, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:32:06,495 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 11 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 04:32:07,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2024-08-14 04:32:14,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-14 04:32:26,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2481320.0, ans=0.125 2024-08-14 04:32:44,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.328e+01 2.583e+01 3.000e+01 1.080e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-14 04:32:44,549 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 04:32:53,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=12.0 2024-08-14 04:33:13,301 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1800, loss[loss=0.1043, beats_loss=0.01019, ecapa_loss=0.0001865, whisper_loss=0.09225, over 15227.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01069, ecapa_loss=0.0001514, whisper_loss=0.08858, over 3844326.51 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:33:18,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2481720.0, ans=0.125 2024-08-14 04:33:33,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2481820.0, ans=0.5 2024-08-14 04:33:39,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-14 04:33:49,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2481920.0, ans=0.2 2024-08-14 04:33:49,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2481920.0, ans=0.035 2024-08-14 04:33:53,114 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 29 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 04:33:56,411 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 04:34:00,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2482020.0, ans=0.125 2024-08-14 04:34:05,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2482020.0, ans=0.025 2024-08-14 04:34:08,336 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 04:34:17,055 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 04:34:27,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1850, loss[loss=0.1006, beats_loss=0.01076, ecapa_loss=0.0001748, whisper_loss=0.08811, over 20744.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0107, ecapa_loss=0.0001519, whisper_loss=0.08873, over 3847096.70 frames. ], batch size: 86, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:34:34,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2482220.0, ans=0.0 2024-08-14 04:34:34,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2482220.0, ans=0.125 2024-08-14 04:34:40,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2482220.0, ans=0.125 2024-08-14 04:34:46,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2482320.0, ans=0.125 2024-08-14 04:34:51,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2482320.0, ans=0.0 2024-08-14 04:34:55,367 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 10 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 04:34:59,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2482420.0, ans=0.1 2024-08-14 04:35:06,730 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 04:35:09,747 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 04:35:11,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2482420.0, ans=0.1 2024-08-14 04:35:13,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.339e+01 2.610e+01 2.958e+01 9.834e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-14 04:35:16,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-14 04:35:21,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2482520.0, ans=0.0 2024-08-14 04:35:22,361 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 04:35:44,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1900, loss[loss=0.1112, beats_loss=0.009965, ecapa_loss=0.0001506, whisper_loss=0.09972, over 17167.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01067, ecapa_loss=0.0001528, whisper_loss=0.08968, over 3878421.59 frames. ], batch size: 65, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:36:00,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2482820.0, ans=0.2 2024-08-14 04:36:24,622 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 04:36:37,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2483020.0, ans=0.125 2024-08-14 04:36:40,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2483020.0, ans=0.125 2024-08-14 04:36:43,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2483020.0, ans=0.0 2024-08-14 04:36:44,692 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 04:36:46,044 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 04:36:49,258 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 04:37:00,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2483220.0, ans=0.1 2024-08-14 04:37:01,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 1950, loss[loss=0.0983, beats_loss=0.009415, ecapa_loss=0.0001765, whisper_loss=0.08712, over 21050.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01074, ecapa_loss=0.0001524, whisper_loss=0.08927, over 3875470.79 frames. ], batch size: 85, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:37:12,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2483220.0, ans=0.125 2024-08-14 04:37:21,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2483320.0, ans=0.125 2024-08-14 04:37:43,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2483420.0, ans=0.125 2024-08-14 04:37:46,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.351e+01 2.542e+01 2.768e+01 3.987e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 04:37:54,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2483520.0, ans=0.125 2024-08-14 04:38:14,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2483620.0, ans=0.0 2024-08-14 04:38:16,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2000, loss[loss=0.08807, beats_loss=0.01038, ecapa_loss=0.0001325, whisper_loss=0.07637, over 15826.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.08985, over 3842270.69 frames. ], batch size: 59, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:38:20,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2483720.0, ans=0.125 2024-08-14 04:38:23,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2483720.0, ans=0.0 2024-08-14 04:38:38,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2483820.0, ans=0.0 2024-08-14 04:38:49,482 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 04:38:49,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2483920.0, ans=0.125 2024-08-14 04:38:51,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2483920.0, ans=0.2 2024-08-14 04:38:52,294 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 04:39:00,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2483920.0, ans=0.125 2024-08-14 04:39:15,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2484020.0, ans=0.1 2024-08-14 04:39:21,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-14 04:39:25,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2484120.0, ans=0.0 2024-08-14 04:39:35,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2484120.0, ans=0.0 2024-08-14 04:39:37,827 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2050, loss[loss=0.1174, beats_loss=0.01061, ecapa_loss=0.000146, whisper_loss=0.1053, over 22742.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001519, whisper_loss=0.09045, over 3869680.28 frames. ], batch size: 87, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:39:41,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-14 04:39:54,129 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 04:40:13,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2484420.0, ans=0.0 2024-08-14 04:40:17,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2484420.0, ans=0.05 2024-08-14 04:40:17,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2484420.0, ans=0.0 2024-08-14 04:40:25,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.326e+01 2.679e+01 3.072e+01 5.038e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-14 04:40:25,916 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 04:40:49,169 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 04:40:57,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2100, loss[loss=0.09612, beats_loss=0.01043, ecapa_loss=9.914e-05, whisper_loss=0.0847, over 15616.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001506, whisper_loss=0.09045, over 3841383.95 frames. ], batch size: 57, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:41:10,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2484820.0, ans=0.0 2024-08-14 04:41:32,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2484920.0, ans=0.2 2024-08-14 04:42:02,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2485120.0, ans=0.125 2024-08-14 04:42:08,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2485120.0, ans=0.2 2024-08-14 04:42:15,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2150, loss[loss=0.1252, beats_loss=0.008607, ecapa_loss=0.0001542, whisper_loss=0.1151, over 22158.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.09072, over 3845901.04 frames. ], batch size: 80, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:42:18,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2485220.0, ans=0.07 2024-08-14 04:42:18,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2485220.0, ans=10.0 2024-08-14 04:42:25,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2485220.0, ans=0.125 2024-08-14 04:42:31,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2485320.0, ans=0.0 2024-08-14 04:42:33,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2485320.0, ans=0.125 2024-08-14 04:42:52,519 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-14 04:43:04,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.306e+01 2.493e+01 2.947e+01 5.632e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-14 04:43:19,693 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 04:43:23,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2485620.0, ans=0.125 2024-08-14 04:43:23,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2485620.0, ans=0.125 2024-08-14 04:43:27,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2485620.0, ans=0.125 2024-08-14 04:43:35,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2200, loss[loss=0.08823, beats_loss=0.01107, ecapa_loss=0.0001665, whisper_loss=0.07549, over 14344.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001518, whisper_loss=0.09062, over 3858901.37 frames. ], batch size: 60, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:43:37,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2485720.0, ans=0.125 2024-08-14 04:43:40,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2485720.0, ans=0.125 2024-08-14 04:44:01,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2485820.0, ans=0.125 2024-08-14 04:44:04,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2485820.0, ans=0.0 2024-08-14 04:44:38,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2486120.0, ans=0.125 2024-08-14 04:44:38,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-14 04:44:40,602 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 04:44:45,823 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 04:44:54,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2250, loss[loss=0.09279, beats_loss=0.01128, ecapa_loss=0.0001284, whisper_loss=0.08022, over 14887.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001521, whisper_loss=0.09111, over 3895295.88 frames. ], batch size: 59, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:45:00,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2486220.0, ans=0.125 2024-08-14 04:45:02,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-14 04:45:07,794 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 04:45:40,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-14 04:45:42,493 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.436e+01 2.743e+01 3.250e+01 7.629e+01, threshold=5.485e+01, percent-clipped=1.0 2024-08-14 04:45:54,227 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 04:45:54,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2486520.0, ans=0.2 2024-08-14 04:46:15,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2300, loss[loss=0.09543, beats_loss=0.01096, ecapa_loss=0.0001642, whisper_loss=0.08283, over 21425.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001528, whisper_loss=0.09126, over 3898497.61 frames. ], batch size: 91, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:46:17,034 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 04:46:28,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2486720.0, ans=0.0 2024-08-14 04:46:33,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-14 04:46:37,840 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 04:46:41,987 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 04:46:45,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2486920.0, ans=0.1 2024-08-14 04:47:21,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-14 04:47:25,145 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 04:47:34,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2350, loss[loss=0.08312, beats_loss=0.008583, ecapa_loss=0.0002527, whisper_loss=0.07201, over 13277.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001544, whisper_loss=0.09184, over 3890338.12 frames. ], batch size: 59, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:48:05,966 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 04:48:09,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2024-08-14 04:48:21,016 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 04:48:22,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.359e+01 2.626e+01 3.027e+01 4.535e+02, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 04:48:48,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2024-08-14 04:48:55,359 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2400, loss[loss=0.09171, beats_loss=0.01194, ecapa_loss=0.0001481, whisper_loss=0.07829, over 16430.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01059, ecapa_loss=0.0001546, whisper_loss=0.09183, over 3913612.11 frames. ], batch size: 68, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:48:56,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2487720.0, ans=0.125 2024-08-14 04:49:07,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2487720.0, ans=0.125 2024-08-14 04:49:33,864 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 04:49:59,710 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 04:50:04,615 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 04:50:04,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2488120.0, ans=0.1 2024-08-14 04:50:06,618 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 04:50:14,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2450, loss[loss=0.08884, beats_loss=0.01054, ecapa_loss=0.0001583, whisper_loss=0.07673, over 15887.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001558, whisper_loss=0.09137, over 3915508.83 frames. ], batch size: 64, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:50:21,782 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-14 04:50:22,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2488220.0, ans=0.125 2024-08-14 04:50:25,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-08-14 04:50:28,095 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 04:50:31,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2488320.0, ans=0.125 2024-08-14 04:50:38,745 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 20 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-14 04:50:41,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2488320.0, ans=0.0 2024-08-14 04:50:58,052 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 04:50:58,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2488420.0, ans=0.1 2024-08-14 04:51:00,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.556e+01 2.864e+01 5.420e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-14 04:51:02,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2488520.0, ans=0.0 2024-08-14 04:51:04,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2488520.0, ans=0.125 2024-08-14 04:51:07,448 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 04:51:29,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2488620.0, ans=0.0 2024-08-14 04:51:32,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2500, loss[loss=0.1141, beats_loss=0.00957, ecapa_loss=0.0001641, whisper_loss=0.1029, over 14558.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001564, whisper_loss=0.09111, over 3922899.01 frames. ], batch size: 60, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:51:43,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-14 04:52:19,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2489020.0, ans=0.0 2024-08-14 04:52:28,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-14 04:52:33,874 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:52:34,839 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 04:52:42,449 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 04:52:44,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2489120.0, ans=0.0 2024-08-14 04:52:53,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2550, loss[loss=0.08936, beats_loss=0.01141, ecapa_loss=0.0001773, whisper_loss=0.07618, over 15581.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001557, whisper_loss=0.09044, over 3911510.54 frames. ], batch size: 64, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:53:19,351 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 04:53:27,794 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 04:53:34,208 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:53:43,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.452e+01 2.668e+01 3.104e+01 5.723e+01, threshold=5.337e+01, percent-clipped=1.0 2024-08-14 04:54:06,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-14 04:54:09,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2024-08-14 04:54:14,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2600, loss[loss=0.1081, beats_loss=0.009878, ecapa_loss=0.0001855, whisper_loss=0.09642, over 21965.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001559, whisper_loss=0.09054, over 3893367.38 frames. ], batch size: 90, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:54:15,807 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 17 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-14 04:54:29,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2489720.0, ans=0.125 2024-08-14 04:54:39,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2489820.0, ans=0.125 2024-08-14 04:54:48,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2489820.0, ans=0.125 2024-08-14 04:54:58,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2024-08-14 04:55:13,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2490020.0, ans=0.125 2024-08-14 04:55:18,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-08-14 04:55:19,414 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 12 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 04:55:38,047 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 04:55:39,676 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 04:55:40,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2490120.0, ans=0.125 2024-08-14 04:55:45,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2490120.0, ans=0.0 2024-08-14 04:55:51,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2650, loss[loss=0.08304, beats_loss=0.01192, ecapa_loss=0.0001664, whisper_loss=0.06945, over 16318.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.000155, whisper_loss=0.0902, over 3907452.67 frames. ], batch size: 65, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:55:51,468 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 04:56:33,176 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 04:56:43,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2490420.0, ans=0.1 2024-08-14 04:56:44,278 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-14 04:56:45,316 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.391e+01 2.607e+01 2.986e+01 4.430e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-14 04:56:47,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2490520.0, ans=0.125 2024-08-14 04:56:57,801 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0513666495680809, model_norm_threshold=52.13920593261719 2024-08-14 04:56:57,966 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.106e+05, grad_sumsq=1.106e+05, orig_rms_sq=1.000e+00 2024-08-14 04:57:15,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2490620.0, ans=0.1 2024-08-14 04:57:18,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2490620.0, ans=0.125 2024-08-14 04:57:22,853 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-14 04:57:26,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2700, loss[loss=0.1068, beats_loss=0.01003, ecapa_loss=0.0001702, whisper_loss=0.0951, over 21290.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001554, whisper_loss=0.08984, over 3890942.56 frames. ], batch size: 88, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:57:37,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2490720.0, ans=0.125 2024-08-14 04:57:42,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-14 04:57:44,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2490720.0, ans=0.125 2024-08-14 04:58:14,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-14 04:58:47,617 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 04:58:52,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-14 04:59:26,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2750, loss[loss=0.08856, beats_loss=0.01185, ecapa_loss=0.000128, whisper_loss=0.07543, over 14627.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.0001555, whisper_loss=0.08998, over 3874618.71 frames. ], batch size: 57, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:59:31,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2491220.0, ans=0.125 2024-08-14 04:59:37,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-14 04:59:38,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2491220.0, ans=0.1 2024-08-14 04:59:51,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2491320.0, ans=0.125 2024-08-14 05:00:27,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2491420.0, ans=0.0 2024-08-14 05:00:36,960 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 05:00:37,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.424e+01 2.607e+01 2.892e+01 1.015e+03, threshold=5.215e+01, percent-clipped=3.0 2024-08-14 05:00:42,212 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 05:01:06,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=12.0 2024-08-14 05:01:19,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2491620.0, ans=0.1 2024-08-14 05:01:21,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2024-08-14 05:01:27,201 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2800, loss[loss=0.1029, beats_loss=0.01257, ecapa_loss=0.0001335, whisper_loss=0.08896, over 15645.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001541, whisper_loss=0.09098, over 3875954.50 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:01:32,418 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 05:01:49,086 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-14 05:01:58,046 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 05:02:03,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2491820.0, ans=0.2 2024-08-14 05:02:37,848 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 05:02:45,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2492020.0, ans=0.125 2024-08-14 05:02:58,367 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 05:03:08,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2492120.0, ans=0.125 2024-08-14 05:03:18,558 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 05:03:20,965 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2850, loss[loss=0.09579, beats_loss=0.01177, ecapa_loss=0.0001513, whisper_loss=0.0825, over 19991.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01091, ecapa_loss=0.0001531, whisper_loss=0.09039, over 3896991.16 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:03:42,671 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:03:48,478 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 05:03:53,319 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-14 05:03:54,725 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 05:03:57,487 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 05:04:02,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2492420.0, ans=0.125 2024-08-14 05:04:06,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.327e+01 2.505e+01 2.806e+01 7.430e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-14 05:04:27,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2024-08-14 05:04:37,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2900, loss[loss=0.1098, beats_loss=0.0115, ecapa_loss=0.0001428, whisper_loss=0.09683, over 23045.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.0001555, whisper_loss=0.09059, over 3891691.29 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:04:49,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2492720.0, ans=0.125 2024-08-14 05:04:54,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-14 05:04:57,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2024-08-14 05:04:59,842 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:05:04,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2492820.0, ans=0.0 2024-08-14 05:05:11,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2492920.0, ans=0.125 2024-08-14 05:05:13,248 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-14 05:05:22,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2493020.0, ans=0.125 2024-08-14 05:05:29,478 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:05:42,546 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 05:05:48,874 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 05:05:52,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 2950, loss[loss=0.1245, beats_loss=0.00802, ecapa_loss=0.0001399, whisper_loss=0.1151, over 22741.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.000157, whisper_loss=0.09097, over 3935680.67 frames. ], batch size: 85, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:05:55,750 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 05:06:11,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2024-08-14 05:06:18,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2493320.0, ans=0.125 2024-08-14 05:06:30,739 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 05:06:33,399 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 38 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 05:06:34,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.417e+01 2.624e+01 2.963e+01 8.640e+01, threshold=5.248e+01, percent-clipped=1.0 2024-08-14 05:06:55,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2493620.0, ans=0.1 2024-08-14 05:07:02,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2493720.0, ans=0.125 2024-08-14 05:07:03,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3000, loss[loss=0.112, beats_loss=0.01032, ecapa_loss=0.0001621, whisper_loss=0.1001, over 20432.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001572, whisper_loss=0.0909, over 3951882.88 frames. ], batch size: 83, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:07:03,629 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 05:07:44,596 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on ASR_libri: loss=0.2518, beats_loss=0, ecapa_loss=0.0005463, whisper_loss=0.2464, over 922467.00 frames. 2024-08-14 05:08:00,235 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on SV_voxceleb1: loss=0.004304, beats_loss=0, ecapa_loss=0.0004304, whisper_loss=0, over 939242.00 frames. 2024-08-14 05:10:04,679 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on AT_audioset: loss=0.02354, beats_loss=0.02354, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 05:10:04,683 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 05:10:05,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2493720.0, ans=0.0 2024-08-14 05:10:17,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-14 05:10:31,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.35 vs. limit=22.5 2024-08-14 05:10:35,236 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-14 05:10:37,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.08 vs. limit=10.0 2024-08-14 05:10:47,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-14 05:10:58,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.78 vs. limit=10.0 2024-08-14 05:11:03,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2494120.0, ans=0.125 2024-08-14 05:11:04,616 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 05:11:17,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3050, loss[loss=0.1073, beats_loss=0.008676, ecapa_loss=0.0001885, whisper_loss=0.09673, over 21841.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001583, whisper_loss=0.09154, over 3973744.44 frames. ], batch size: 94, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:11:38,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-08-14 05:11:44,436 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 05:11:58,612 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 05:11:59,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.503e+01 2.783e+01 3.185e+01 5.631e+01, threshold=5.566e+01, percent-clipped=1.0 2024-08-14 05:12:00,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-14 05:12:24,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2494620.0, ans=0.125 2024-08-14 05:12:28,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3100, loss[loss=0.09862, beats_loss=0.01059, ecapa_loss=0.0001533, whisper_loss=0.0865, over 23126.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001586, whisper_loss=0.09167, over 3940099.48 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:12:29,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=12.0 2024-08-14 05:12:31,604 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 05:12:45,255 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.024e+01 2024-08-14 05:12:49,671 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 05:13:00,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2494920.0, ans=0.0 2024-08-14 05:13:26,305 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 05:13:41,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2495220.0, ans=0.0 2024-08-14 05:13:42,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3150, loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.0001305, whisper_loss=0.09401, over 16819.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01061, ecapa_loss=0.0001591, whisper_loss=0.09229, over 3919109.39 frames. ], batch size: 64, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:13:43,798 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 05:13:48,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2495220.0, ans=0.2 2024-08-14 05:14:01,325 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 05:14:15,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2495420.0, ans=0.0 2024-08-14 05:14:25,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.581e+01 2.876e+01 7.737e+01, threshold=5.161e+01, percent-clipped=2.0 2024-08-14 05:14:26,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-08-14 05:14:32,361 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 05:14:41,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-14 05:14:48,568 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 05:14:52,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2024-08-14 05:14:53,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2495620.0, ans=0.0 2024-08-14 05:14:55,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3200, loss[loss=0.09464, beats_loss=0.01229, ecapa_loss=0.0001562, whisper_loss=0.08079, over 16858.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01063, ecapa_loss=0.0001588, whisper_loss=0.09268, over 3893008.05 frames. ], batch size: 67, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:14:57,462 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 40 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 05:14:59,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2495720.0, ans=0.125 2024-08-14 05:15:11,654 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 05:15:14,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2495820.0, ans=0.2 2024-08-14 05:15:23,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2495920.0, ans=0.2 2024-08-14 05:15:52,350 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 05:15:54,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2496120.0, ans=0.125 2024-08-14 05:16:08,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3250, loss[loss=0.1112, beats_loss=0.00974, ecapa_loss=0.0001496, whisper_loss=0.09996, over 19743.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01062, ecapa_loss=0.0001581, whisper_loss=0.09281, over 3908746.35 frames. ], batch size: 77, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:16:17,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2496220.0, ans=0.125 2024-08-14 05:16:25,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-14 05:16:26,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2496320.0, ans=0.1 2024-08-14 05:16:32,032 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 05:16:33,469 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 05:16:45,993 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 05:16:48,861 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 05:16:51,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.775e+01 3.145e+01 3.018e+02, threshold=5.551e+01, percent-clipped=3.0 2024-08-14 05:17:04,834 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 05:17:06,405 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 16 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 05:17:12,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2496620.0, ans=0.07 2024-08-14 05:17:18,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2496620.0, ans=0.07 2024-08-14 05:17:20,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3300, loss[loss=0.1105, beats_loss=0.01182, ecapa_loss=0.0001216, whisper_loss=0.09746, over 18409.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001569, whisper_loss=0.09199, over 3923257.31 frames. ], batch size: 71, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:17:22,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2496720.0, ans=0.2 2024-08-14 05:17:50,365 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 05:17:52,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-14 05:18:22,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2497120.0, ans=0.0 2024-08-14 05:18:33,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3350, loss[loss=0.1336, beats_loss=0.008867, ecapa_loss=0.0001529, whisper_loss=0.1232, over 23933.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001564, whisper_loss=0.09134, over 3909271.25 frames. ], batch size: 91, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:18:34,155 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 05:18:42,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2497220.0, ans=0.2 2024-08-14 05:18:42,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2497220.0, ans=0.125 2024-08-14 05:19:02,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2497420.0, ans=0.1 2024-08-14 05:19:15,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.81 vs. limit=22.5 2024-08-14 05:19:16,822 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 05:19:17,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.312e+01 2.517e+01 2.799e+01 4.556e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-14 05:19:22,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2497520.0, ans=0.125 2024-08-14 05:19:22,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2497520.0, ans=0.1 2024-08-14 05:19:47,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3400, loss[loss=0.09199, beats_loss=0.01185, ecapa_loss=0.0001305, whisper_loss=0.07883, over 23424.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001554, whisper_loss=0.09095, over 3908732.07 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:19:49,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2497720.0, ans=0.0 2024-08-14 05:20:03,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2497820.0, ans=0.125 2024-08-14 05:20:46,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-14 05:20:50,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2498120.0, ans=0.125 2024-08-14 05:20:59,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3450, loss[loss=0.1267, beats_loss=0.008833, ecapa_loss=0.0001491, whisper_loss=0.1164, over 22970.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001577, whisper_loss=0.0905, over 3923890.69 frames. ], batch size: 91, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:21:04,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2498220.0, ans=0.1 2024-08-14 05:21:10,024 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 05:21:26,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-14 05:21:34,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-14 05:21:37,729 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 05:21:39,268 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 05:21:43,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.288e+01 2.702e+01 3.056e+01 2.683e+02, threshold=5.405e+01, percent-clipped=1.0 2024-08-14 05:21:51,165 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 05:21:52,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2498520.0, ans=0.125 2024-08-14 05:22:04,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2498620.0, ans=0.0 2024-08-14 05:22:12,748 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3500, loss[loss=0.1121, beats_loss=0.007831, ecapa_loss=0.0001637, whisper_loss=0.1027, over 21015.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01083, ecapa_loss=0.0001576, whisper_loss=0.08994, over 3925047.67 frames. ], batch size: 83, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:22:16,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2498720.0, ans=0.1 2024-08-14 05:22:19,141 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 05:22:45,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-14 05:22:49,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2498920.0, ans=0.125 2024-08-14 05:22:51,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2024-08-14 05:22:52,794 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-14 05:22:53,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2498920.0, ans=0.0 2024-08-14 05:23:06,566 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-14 05:23:11,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2499120.0, ans=0.125 2024-08-14 05:23:19,717 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-14 05:23:25,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3550, loss[loss=0.09469, beats_loss=0.01199, ecapa_loss=0.0001378, whisper_loss=0.08132, over 20252.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01084, ecapa_loss=0.0001579, whisper_loss=0.0904, over 3940106.01 frames. ], batch size: 79, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:23:30,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2499220.0, ans=0.0 2024-08-14 05:23:57,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2499420.0, ans=0.2 2024-08-14 05:23:57,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2499420.0, ans=0.125 2024-08-14 05:24:10,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.402e+01 2.607e+01 2.928e+01 5.339e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 05:24:15,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2499520.0, ans=0.125 2024-08-14 05:24:15,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2499520.0, ans=0.1 2024-08-14 05:24:20,761 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 05:24:37,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2499620.0, ans=0.2 2024-08-14 05:24:39,711 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3600, loss[loss=0.1052, beats_loss=0.01145, ecapa_loss=0.0001293, whisper_loss=0.09248, over 18027.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001594, whisper_loss=0.09103, over 3910600.91 frames. ], batch size: 71, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:24:56,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2499820.0, ans=0.2 2024-08-14 05:25:03,710 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 05:25:06,549 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 05:25:06,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2499820.0, ans=0.0 2024-08-14 05:25:25,471 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 05:25:41,572 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 05:25:53,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3650, loss[loss=0.07742, beats_loss=0.01294, ecapa_loss=0.0001483, whisper_loss=0.063, over 19042.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001591, whisper_loss=0.091, over 3885794.41 frames. ], batch size: 78, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:26:04,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2500220.0, ans=0.125 2024-08-14 05:26:37,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.437e+01 2.673e+01 3.010e+01 1.345e+02, threshold=5.347e+01, percent-clipped=1.0 2024-08-14 05:26:48,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2500520.0, ans=0.125 2024-08-14 05:26:51,318 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 05:27:05,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-14 05:27:05,935 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 05:27:07,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3700, loss[loss=0.1083, beats_loss=0.01205, ecapa_loss=0.0001312, whisper_loss=0.09495, over 20349.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001587, whisper_loss=0.09098, over 3840758.46 frames. ], batch size: 79, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:27:19,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-08-14 05:27:33,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2500820.0, ans=0.0 2024-08-14 05:27:43,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2500920.0, ans=0.0 2024-08-14 05:28:11,329 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 05:28:14,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2501120.0, ans=0.125 2024-08-14 05:28:19,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3750, loss[loss=0.0979, beats_loss=0.01056, ecapa_loss=0.0001837, whisper_loss=0.08551, over 21607.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001589, whisper_loss=0.09033, over 3828561.19 frames. ], batch size: 88, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:28:23,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2501220.0, ans=0.125 2024-08-14 05:28:31,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2024-08-14 05:28:34,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-08-14 05:28:41,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-08-14 05:28:44,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2501320.0, ans=0.2 2024-08-14 05:28:48,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2024-08-14 05:28:54,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2501420.0, ans=0.0 2024-08-14 05:29:01,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2501420.0, ans=0.0 2024-08-14 05:29:02,514 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 05:29:03,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.382e+01 2.609e+01 2.989e+01 8.009e+01, threshold=5.218e+01, percent-clipped=2.0 2024-08-14 05:29:14,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2501520.0, ans=0.0 2024-08-14 05:29:15,615 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 05:29:21,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2501620.0, ans=0.125 2024-08-14 05:29:21,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2501620.0, ans=0.1 2024-08-14 05:29:29,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2501620.0, ans=0.0 2024-08-14 05:29:32,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3800, loss[loss=0.09865, beats_loss=0.01237, ecapa_loss=0.0001362, whisper_loss=0.08492, over 16916.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001608, whisper_loss=0.09034, over 3842016.63 frames. ], batch size: 65, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:29:34,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2501720.0, ans=0.07 2024-08-14 05:30:00,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2501820.0, ans=0.125 2024-08-14 05:30:01,195 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 05:30:07,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-08-14 05:30:12,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2501920.0, ans=0.1 2024-08-14 05:30:15,220 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 05:30:18,504 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=12.0 2024-08-14 05:30:27,618 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 05:30:37,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2502120.0, ans=0.125 2024-08-14 05:30:46,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3850, loss[loss=0.1179, beats_loss=0.01057, ecapa_loss=0.0001296, whisper_loss=0.1061, over 22290.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001607, whisper_loss=0.09061, over 3854749.69 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:30:46,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2502220.0, ans=0.125 2024-08-14 05:30:52,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2502220.0, ans=0.5 2024-08-14 05:30:58,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2502220.0, ans=0.025 2024-08-14 05:31:04,380 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 05:31:24,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2502420.0, ans=0.125 2024-08-14 05:31:25,619 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 05:31:28,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2502520.0, ans=0.125 2024-08-14 05:31:29,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.353e+01 2.523e+01 2.870e+01 4.680e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 05:31:45,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2502620.0, ans=0.125 2024-08-14 05:31:47,940 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 05:31:48,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2502620.0, ans=0.0 2024-08-14 05:31:57,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.01 vs. limit=22.5 2024-08-14 05:31:59,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3900, loss[loss=0.1174, beats_loss=0.009287, ecapa_loss=0.0001659, whisper_loss=0.1064, over 22585.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.000161, whisper_loss=0.09136, over 3888615.57 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:32:08,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2502720.0, ans=0.125 2024-08-14 05:32:08,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2502720.0, ans=0.0 2024-08-14 05:32:22,819 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.310e-03 2024-08-14 05:32:24,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2502820.0, ans=0.1 2024-08-14 05:32:31,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2502920.0, ans=0.0 2024-08-14 05:32:36,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2502920.0, ans=0.125 2024-08-14 05:32:49,176 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 05:32:51,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2503020.0, ans=0.125 2024-08-14 05:32:52,185 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:33:12,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 3950, loss[loss=0.1173, beats_loss=0.008535, ecapa_loss=0.0001942, whisper_loss=0.1068, over 16476.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001608, whisper_loss=0.09164, over 3907627.83 frames. ], batch size: 65, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:33:20,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2024-08-14 05:33:22,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-14 05:33:23,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2503220.0, ans=0.0 2024-08-14 05:33:29,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2503320.0, ans=0.125 2024-08-14 05:33:55,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2503520.0, ans=0.0 2024-08-14 05:33:55,863 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.430e+01 2.817e+01 3.192e+01 2.202e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-14 05:34:01,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2503520.0, ans=0.0 2024-08-14 05:34:18,269 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 05:34:25,027 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4000, loss[loss=0.09725, beats_loss=0.01229, ecapa_loss=0.0001284, whisper_loss=0.08368, over 21900.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01062, ecapa_loss=0.0001617, whisper_loss=0.09204, over 3899387.94 frames. ], batch size: 86, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:34:28,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2503720.0, ans=0.125 2024-08-14 05:34:31,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2503720.0, ans=0.0 2024-08-14 05:34:47,043 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 05:35:01,908 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 05:35:31,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2504120.0, ans=0.0 2024-08-14 05:35:31,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2504120.0, ans=0.125 2024-08-14 05:35:32,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2504120.0, ans=0.125 2024-08-14 05:35:38,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4050, loss[loss=0.1105, beats_loss=0.009652, ecapa_loss=0.0001786, whisper_loss=0.0991, over 20081.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01066, ecapa_loss=0.0001612, whisper_loss=0.09195, over 3938969.45 frames. ], batch size: 81, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:36:19,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2504420.0, ans=0.0 2024-08-14 05:36:20,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2504420.0, ans=0.1 2024-08-14 05:36:22,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.679e+01 2.286e+01 2.527e+01 2.897e+01 4.039e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-14 05:36:27,777 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 05:36:30,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2504520.0, ans=0.0 2024-08-14 05:36:51,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4100, loss[loss=0.1179, beats_loss=0.0112, ecapa_loss=0.0001582, whisper_loss=0.1051, over 23165.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.0001592, whisper_loss=0.09156, over 3917812.20 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:36:52,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-08-14 05:37:17,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2504820.0, ans=0.125 2024-08-14 05:37:27,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2024-08-14 05:37:40,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2505020.0, ans=0.07 2024-08-14 05:37:41,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2505020.0, ans=0.125 2024-08-14 05:37:47,889 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.655e+01 2024-08-14 05:37:55,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.15 vs. limit=10.0 2024-08-14 05:38:04,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4150, loss[loss=0.07039, beats_loss=0.01444, ecapa_loss=0.0001352, whisper_loss=0.0546, over 14187.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.0001596, whisper_loss=0.09117, over 3911238.59 frames. ], batch size: 58, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:38:09,309 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 05:38:10,807 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 05:38:11,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2505220.0, ans=0.125 2024-08-14 05:38:25,154 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 05:38:32,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2505420.0, ans=0.0 2024-08-14 05:38:36,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2505420.0, ans=0.125 2024-08-14 05:38:37,903 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 05:38:42,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2505420.0, ans=0.125 2024-08-14 05:38:49,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.395e+01 2.659e+01 2.961e+01 5.291e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 05:39:08,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2505620.0, ans=0.0 2024-08-14 05:39:15,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-14 05:39:17,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4200, loss[loss=0.09395, beats_loss=0.01112, ecapa_loss=0.0001761, whisper_loss=0.08107, over 15543.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001593, whisper_loss=0.09109, over 3860864.17 frames. ], batch size: 62, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:39:19,578 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 05:39:27,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2505720.0, ans=0.0 2024-08-14 05:39:36,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2505820.0, ans=0.2 2024-08-14 05:39:46,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2505920.0, ans=0.0 2024-08-14 05:40:09,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2506020.0, ans=0.1 2024-08-14 05:40:12,178 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 05:40:21,163 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 05:40:31,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4250, loss[loss=0.101, beats_loss=0.009075, ecapa_loss=0.0001704, whisper_loss=0.0902, over 15268.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001591, whisper_loss=0.09109, over 3883139.32 frames. ], batch size: 62, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:40:34,362 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-14 05:41:08,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2506420.0, ans=0.0 2024-08-14 05:41:16,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.332e+01 2.542e+01 2.821e+01 5.499e+01, threshold=5.083e+01, percent-clipped=1.0 2024-08-14 05:41:21,210 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.717e-02 2024-08-14 05:41:27,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2506520.0, ans=0.04949747468305833 2024-08-14 05:41:37,144 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 34 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 05:41:40,328 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 05:41:44,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4300, loss[loss=0.07476, beats_loss=0.01408, ecapa_loss=0.0001098, whisper_loss=0.05958, over 14209.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001595, whisper_loss=0.09069, over 3842424.98 frames. ], batch size: 55, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:42:09,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2506820.0, ans=0.125 2024-08-14 05:42:12,332 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 05:42:15,107 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 05:42:18,211 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-14 05:42:25,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2506920.0, ans=0.0 2024-08-14 05:42:27,942 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 05:42:42,839 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 05:42:49,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2507120.0, ans=0.1 2024-08-14 05:42:51,954 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 05:42:59,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4350, loss[loss=0.09061, beats_loss=0.01271, ecapa_loss=0.0001409, whisper_loss=0.0765, over 22421.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001592, whisper_loss=0.09023, over 3809830.92 frames. ], batch size: 91, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:43:11,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2507220.0, ans=0.5 2024-08-14 05:43:19,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-14 05:43:21,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2507320.0, ans=0.1 2024-08-14 05:43:30,023 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 05:43:43,958 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.369e+01 2.648e+01 3.108e+01 4.930e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-14 05:43:44,178 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 05:44:06,646 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 05:44:12,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4400, loss[loss=0.1134, beats_loss=0.01017, ecapa_loss=0.0001846, whisper_loss=0.1013, over 21679.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001579, whisper_loss=0.09085, over 3819332.16 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:44:22,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2507720.0, ans=0.05 2024-08-14 05:44:23,468 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 05:44:41,963 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-14 05:44:45,172 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 05:44:55,631 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 05:45:07,785 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 05:45:09,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2508020.0, ans=0.0 2024-08-14 05:45:18,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2508120.0, ans=0.125 2024-08-14 05:45:27,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4450, loss[loss=0.08103, beats_loss=0.01342, ecapa_loss=0.0001278, whisper_loss=0.06633, over 16005.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001572, whisper_loss=0.09093, over 3874165.85 frames. ], batch size: 63, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:45:28,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2508220.0, ans=0.125 2024-08-14 05:45:39,935 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:45:57,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2508420.0, ans=0.09899494936611666 2024-08-14 05:46:00,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2508420.0, ans=0.2 2024-08-14 05:46:02,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2508420.0, ans=0.0 2024-08-14 05:46:13,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.408e+01 2.704e+01 3.117e+01 4.091e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-14 05:46:16,737 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:46:19,892 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 35 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 05:46:21,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2508520.0, ans=0.125 2024-08-14 05:46:25,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-14 05:46:30,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2508620.0, ans=0.125 2024-08-14 05:46:34,246 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.97 vs. limit=6.0 2024-08-14 05:46:42,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4500, loss[loss=0.1354, beats_loss=0.007198, ecapa_loss=0.0001575, whisper_loss=0.1266, over 18439.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001576, whisper_loss=0.09083, over 3882817.96 frames. ], batch size: 68, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:46:43,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2508720.0, ans=0.0 2024-08-14 05:46:44,727 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-14 05:46:57,740 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 05:47:06,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2508820.0, ans=0.125 2024-08-14 05:47:11,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2508820.0, ans=10.0 2024-08-14 05:47:36,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2509020.0, ans=0.125 2024-08-14 05:47:39,149 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 05:47:39,882 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-14 05:47:54,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2509120.0, ans=0.125 2024-08-14 05:48:01,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4550, loss[loss=0.1361, beats_loss=0.007577, ecapa_loss=0.0001621, whisper_loss=0.1269, over 17776.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001569, whisper_loss=0.091, over 3909630.19 frames. ], batch size: 64, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:48:03,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2509220.0, ans=0.125 2024-08-14 05:48:29,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2509320.0, ans=0.025 2024-08-14 05:48:34,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2509420.0, ans=0.125 2024-08-14 05:48:36,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=12.0 2024-08-14 05:48:47,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=15.0 2024-08-14 05:48:51,401 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.350e+01 2.581e+01 3.010e+01 9.450e+01, threshold=5.163e+01, percent-clipped=2.0 2024-08-14 05:49:00,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2509520.0, ans=0.125 2024-08-14 05:49:02,964 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 05:49:03,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2509520.0, ans=0.0 2024-08-14 05:49:09,177 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 05:49:10,666 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 05:49:12,433 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 05:49:20,649 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4600, loss[loss=0.1187, beats_loss=0.009464, ecapa_loss=0.0001574, whisper_loss=0.1077, over 17730.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001578, whisper_loss=0.09092, over 3888080.96 frames. ], batch size: 69, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:49:37,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2509820.0, ans=0.0 2024-08-14 05:49:44,883 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 05:49:52,672 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-14 05:49:57,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2509920.0, ans=0.04949747468305833 2024-08-14 05:50:13,522 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 05:50:26,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2024-08-14 05:50:41,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4650, loss[loss=0.09968, beats_loss=0.008084, ecapa_loss=0.0002284, whisper_loss=0.08931, over 13160.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001581, whisper_loss=0.09068, over 3886328.80 frames. ], batch size: 56, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:50:45,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-14 05:50:53,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2510220.0, ans=0.0 2024-08-14 05:50:57,540 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 05:51:09,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2510320.0, ans=0.125 2024-08-14 05:51:09,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2510320.0, ans=0.04949747468305833 2024-08-14 05:51:10,294 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 05:51:10,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2510320.0, ans=0.0 2024-08-14 05:51:11,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2510420.0, ans=0.0 2024-08-14 05:51:17,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2510420.0, ans=0.2 2024-08-14 05:51:30,950 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.365e+01 2.623e+01 2.877e+01 4.425e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 05:51:31,287 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 05:51:33,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-14 05:51:34,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2510520.0, ans=0.04949747468305833 2024-08-14 05:52:00,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=12.0 2024-08-14 05:52:00,621 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4700, loss[loss=0.08252, beats_loss=0.01299, ecapa_loss=0.0001348, whisper_loss=0.06818, over 14207.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001576, whisper_loss=0.0916, over 3861283.72 frames. ], batch size: 56, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:52:05,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2510720.0, ans=0.125 2024-08-14 05:52:17,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=12.0 2024-08-14 05:52:18,586 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 05:52:35,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2510920.0, ans=0.0 2024-08-14 05:52:41,501 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 05:52:47,583 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 37 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 05:52:47,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2511020.0, ans=0.1 2024-08-14 05:52:54,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2511020.0, ans=0.04949747468305833 2024-08-14 05:53:02,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2511120.0, ans=0.0 2024-08-14 05:53:15,457 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 05:53:17,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2511120.0, ans=0.125 2024-08-14 05:53:19,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4750, loss[loss=0.1389, beats_loss=0.007226, ecapa_loss=0.0001755, whisper_loss=0.13, over 19599.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001579, whisper_loss=0.09202, over 3884033.30 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:53:23,316 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 05:53:25,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2511220.0, ans=0.125 2024-08-14 05:53:39,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2511320.0, ans=0.0 2024-08-14 05:53:47,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2511320.0, ans=0.125 2024-08-14 05:54:01,451 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 05:54:06,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2511520.0, ans=0.125 2024-08-14 05:54:08,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.355e+01 2.556e+01 2.982e+01 9.125e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-14 05:54:11,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2511520.0, ans=0.1 2024-08-14 05:54:26,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 05:54:29,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2511620.0, ans=0.125 2024-08-14 05:54:35,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2024-08-14 05:54:36,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2511620.0, ans=0.125 2024-08-14 05:54:38,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2511720.0, ans=0.0 2024-08-14 05:54:39,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4800, loss[loss=0.07997, beats_loss=0.0114, ecapa_loss=0.0002105, whisper_loss=0.06647, over 11847.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001582, whisper_loss=0.09124, over 3877523.84 frames. ], batch size: 53, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:54:42,603 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 05:55:20,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2511920.0, ans=0.0 2024-08-14 05:55:45,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2512120.0, ans=0.125 2024-08-14 05:55:50,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2512120.0, ans=0.1 2024-08-14 05:56:01,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4850, loss[loss=0.09375, beats_loss=0.01236, ecapa_loss=0.0001854, whisper_loss=0.07954, over 20828.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001587, whisper_loss=0.09164, over 3911932.53 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:56:03,461 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 05:56:19,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2512320.0, ans=0.125 2024-08-14 05:56:20,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2024-08-14 05:56:24,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2512320.0, ans=0.2 2024-08-14 05:56:26,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-14 05:56:31,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-14 05:56:33,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-14 05:56:41,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2512420.0, ans=0.125 2024-08-14 05:56:50,701 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 05:56:51,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.430e+01 2.607e+01 2.996e+01 1.441e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-14 05:56:58,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2512520.0, ans=0.125 2024-08-14 05:57:22,313 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4900, loss[loss=0.09487, beats_loss=0.01144, ecapa_loss=0.0001437, whisper_loss=0.08199, over 22679.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001585, whisper_loss=0.09163, over 3901452.79 frames. ], batch size: 91, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:57:25,474 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 05:57:30,602 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:57:39,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=8.0 2024-08-14 05:57:49,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2512820.0, ans=0.1 2024-08-14 05:57:52,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2512920.0, ans=0.125 2024-08-14 05:58:10,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2513020.0, ans=0.125 2024-08-14 05:58:15,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2513020.0, ans=0.0 2024-08-14 05:58:15,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2024-08-14 05:58:20,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2513020.0, ans=0.0 2024-08-14 05:58:34,695 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 05:58:36,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2513120.0, ans=0.0 2024-08-14 05:58:40,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 4950, loss[loss=0.1195, beats_loss=0.01071, ecapa_loss=0.0001246, whisper_loss=0.1076, over 24285.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001581, whisper_loss=0.09119, over 3911284.63 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:58:44,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2513220.0, ans=0.125 2024-08-14 05:58:50,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2513220.0, ans=0.125 2024-08-14 05:58:52,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2513220.0, ans=0.125 2024-08-14 05:59:13,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-14 05:59:22,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2513420.0, ans=0.1 2024-08-14 05:59:29,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.393e+01 2.657e+01 2.925e+01 4.625e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-14 05:59:29,736 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 05:59:32,388 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:59:32,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2513520.0, ans=0.5 2024-08-14 05:59:34,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2513520.0, ans=0.0 2024-08-14 05:59:38,356 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 05:59:40,094 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-14 05:59:44,827 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 05:59:46,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2513620.0, ans=0.125 2024-08-14 05:59:58,113 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5000, loss[loss=0.1054, beats_loss=0.0129, ecapa_loss=0.0001661, whisper_loss=0.09086, over 22459.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001582, whisper_loss=0.09142, over 3912328.91 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:59:58,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2513720.0, ans=0.125 2024-08-14 06:00:03,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2513720.0, ans=0.2 2024-08-14 06:00:30,187 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 06:00:38,163 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-14 06:00:56,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2514020.0, ans=0.0 2024-08-14 06:00:56,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2024-08-14 06:00:59,044 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 06:01:02,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2514120.0, ans=0.05 2024-08-14 06:01:12,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2514120.0, ans=0.025 2024-08-14 06:01:16,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5050, loss[loss=0.1116, beats_loss=0.01187, ecapa_loss=0.0001514, whisper_loss=0.0982, over 21836.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001575, whisper_loss=0.09155, over 3919572.30 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:01:46,674 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 06:01:48,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2514420.0, ans=0.5 2024-08-14 06:01:58,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2514420.0, ans=0.0 2024-08-14 06:01:59,371 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 06:02:03,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2514520.0, ans=0.0 2024-08-14 06:02:05,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.401e+01 2.602e+01 2.912e+01 4.134e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-14 06:02:08,535 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-14 06:02:14,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2514520.0, ans=0.0 2024-08-14 06:02:28,060 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 06:02:30,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2514620.0, ans=0.125 2024-08-14 06:02:34,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5100, loss[loss=0.09336, beats_loss=0.01189, ecapa_loss=0.0001206, whisper_loss=0.08026, over 19247.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001567, whisper_loss=0.09138, over 3910291.67 frames. ], batch size: 73, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:02:48,792 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 06:03:16,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2514920.0, ans=0.1 2024-08-14 06:03:26,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2515020.0, ans=0.125 2024-08-14 06:03:41,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-14 06:03:42,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2024-08-14 06:03:50,936 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 16 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 06:03:53,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5150, loss[loss=0.1004, beats_loss=0.01251, ecapa_loss=0.0001456, whisper_loss=0.0864, over 22037.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001566, whisper_loss=0.09159, over 3889909.09 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:03:53,736 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 06:04:07,010 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 06:04:11,547 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 06:04:28,700 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:04:32,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2515420.0, ans=0.2 2024-08-14 06:04:42,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.426e+01 2.661e+01 3.204e+01 7.186e+01, threshold=5.323e+01, percent-clipped=2.0 2024-08-14 06:04:46,192 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 06:05:03,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2515620.0, ans=0.125 2024-08-14 06:05:12,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5200, loss[loss=0.09421, beats_loss=0.01287, ecapa_loss=0.0001334, whisper_loss=0.08001, over 22544.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001559, whisper_loss=0.09174, over 3905898.55 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:05:15,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-14 06:05:22,753 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 06:05:24,217 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-14 06:05:24,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2515720.0, ans=0.05 2024-08-14 06:05:37,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2515820.0, ans=0.1 2024-08-14 06:05:43,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2515920.0, ans=0.05 2024-08-14 06:05:51,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-14 06:06:00,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2516020.0, ans=0.1 2024-08-14 06:06:01,155 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 06:06:03,537 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2024-08-14 06:06:32,156 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5250, loss[loss=0.1139, beats_loss=0.008038, ecapa_loss=0.0001325, whisper_loss=0.1045, over 17815.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001558, whisper_loss=0.09087, over 3907535.07 frames. ], batch size: 65, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:06:41,936 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 06:06:44,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-14 06:06:47,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2516320.0, ans=0.1 2024-08-14 06:06:47,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-14 06:06:52,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2516320.0, ans=0.125 2024-08-14 06:06:55,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-14 06:07:08,565 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 06:07:14,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2516420.0, ans=0.0 2024-08-14 06:07:21,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.375e+01 2.671e+01 2.925e+01 9.126e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-14 06:07:23,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2516520.0, ans=0.125 2024-08-14 06:07:32,986 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 06:07:35,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2516620.0, ans=0.0 2024-08-14 06:07:35,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2516620.0, ans=0.125 2024-08-14 06:07:41,099 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 06:07:45,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-14 06:07:52,381 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5300, loss[loss=0.116, beats_loss=0.009896, ecapa_loss=0.0001975, whisper_loss=0.1041, over 16615.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01086, ecapa_loss=0.0001578, whisper_loss=0.09068, over 3897861.28 frames. ], batch size: 70, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:07:56,855 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 06:07:59,125 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 06:08:01,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2516720.0, ans=0.0 2024-08-14 06:08:03,485 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 06:08:10,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2516820.0, ans=0.0 2024-08-14 06:08:13,288 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 06:08:18,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2516820.0, ans=0.0 2024-08-14 06:08:24,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2516920.0, ans=0.0 2024-08-14 06:08:55,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2517120.0, ans=0.0 2024-08-14 06:09:12,422 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5350, loss[loss=0.1049, beats_loss=0.01251, ecapa_loss=0.0001591, whisper_loss=0.0908, over 16767.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001574, whisper_loss=0.09075, over 3901370.18 frames. ], batch size: 66, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:09:19,766 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 06:09:35,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2517320.0, ans=0.1 2024-08-14 06:09:43,692 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 06:09:45,627 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.054e-01 2024-08-14 06:09:52,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2517420.0, ans=0.0 2024-08-14 06:10:01,938 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.326e+01 2.604e+01 3.065e+01 1.793e+02, threshold=5.208e+01, percent-clipped=2.0 2024-08-14 06:10:26,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2517620.0, ans=0.125 2024-08-14 06:10:32,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5400, loss[loss=0.07626, beats_loss=0.01216, ecapa_loss=0.0001242, whisper_loss=0.06286, over 14483.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001572, whisper_loss=0.09083, over 3892340.01 frames. ], batch size: 55, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:11:06,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2517920.0, ans=0.125 2024-08-14 06:11:08,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2517920.0, ans=0.0 2024-08-14 06:11:29,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2518020.0, ans=0.125 2024-08-14 06:11:44,694 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 06:11:51,725 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5450, loss[loss=0.1137, beats_loss=0.01239, ecapa_loss=0.0001451, whisper_loss=0.09983, over 16968.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001577, whisper_loss=0.09094, over 3902839.18 frames. ], batch size: 69, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:11:52,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2518220.0, ans=0.125 2024-08-14 06:12:14,380 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 06:12:24,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2518420.0, ans=0.0 2024-08-14 06:12:24,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2024-08-14 06:12:32,825 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 06:12:39,143 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 06:12:41,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.369e+01 2.569e+01 2.930e+01 1.155e+02, threshold=5.138e+01, percent-clipped=3.0 2024-08-14 06:12:52,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2518520.0, ans=0.05 2024-08-14 06:12:57,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2518620.0, ans=0.125 2024-08-14 06:12:57,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2024-08-14 06:12:58,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2518620.0, ans=0.0 2024-08-14 06:13:10,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5500, loss[loss=0.1149, beats_loss=0.01082, ecapa_loss=0.0001862, whisper_loss=0.1022, over 22138.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001562, whisper_loss=0.09116, over 3888334.63 frames. ], batch size: 95, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:13:27,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2518820.0, ans=0.035 2024-08-14 06:13:48,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2518920.0, ans=0.0 2024-08-14 06:13:50,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2518920.0, ans=0.125 2024-08-14 06:14:01,293 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 06:14:30,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5550, loss[loss=0.07096, beats_loss=0.01363, ecapa_loss=0.0001609, whisper_loss=0.05573, over 19093.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.000156, whisper_loss=0.09091, over 3912103.91 frames. ], batch size: 81, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:14:31,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-14 06:14:51,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2519320.0, ans=0.2 2024-08-14 06:15:13,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2519420.0, ans=0.0 2024-08-14 06:15:17,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2519420.0, ans=0.125 2024-08-14 06:15:21,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.517e+01 2.810e+01 6.286e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-14 06:15:33,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-14 06:15:50,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5600, loss[loss=0.09169, beats_loss=0.01092, ecapa_loss=0.000177, whisper_loss=0.079, over 19877.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001555, whisper_loss=0.09163, over 3913940.01 frames. ], batch size: 79, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:15:57,492 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 06:16:02,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-14 06:16:12,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2519820.0, ans=0.0 2024-08-14 06:16:12,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2519820.0, ans=0.125 2024-08-14 06:16:44,154 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-14 06:17:00,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-08-14 06:17:05,573 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 34 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 06:17:10,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5650, loss[loss=0.1115, beats_loss=0.01227, ecapa_loss=0.0001579, whisper_loss=0.0976, over 22379.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.0001581, whisper_loss=0.09173, over 3905408.29 frames. ], batch size: 91, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:17:17,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2520220.0, ans=0.125 2024-08-14 06:17:25,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2520320.0, ans=0.1 2024-08-14 06:17:35,022 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 06:17:36,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2520320.0, ans=0.2 2024-08-14 06:17:36,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2520320.0, ans=0.0 2024-08-14 06:17:39,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2520320.0, ans=0.2 2024-08-14 06:17:55,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2520420.0, ans=0.0 2024-08-14 06:18:00,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.353e+01 2.635e+01 2.874e+01 6.701e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 06:18:12,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-14 06:18:22,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2520620.0, ans=0.125 2024-08-14 06:18:32,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5700, loss[loss=0.1044, beats_loss=0.01002, ecapa_loss=0.0001491, whisper_loss=0.09289, over 16843.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01073, ecapa_loss=0.0001586, whisper_loss=0.09154, over 3924240.18 frames. ], batch size: 65, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:18:41,138 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-14 06:18:58,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2520820.0, ans=0.2 2024-08-14 06:19:00,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2520820.0, ans=0.1 2024-08-14 06:19:02,060 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 06:19:14,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2520920.0, ans=0.0 2024-08-14 06:19:17,544 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 36 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-14 06:19:34,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2521020.0, ans=0.0 2024-08-14 06:19:35,693 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 06:19:50,620 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.919e-02 2024-08-14 06:19:52,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5750, loss[loss=0.1037, beats_loss=0.01121, ecapa_loss=0.0001674, whisper_loss=0.09084, over 21760.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001574, whisper_loss=0.09163, over 3919854.76 frames. ], batch size: 86, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:20:10,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-14 06:20:12,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2521320.0, ans=0.125 2024-08-14 06:20:18,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2521320.0, ans=0.1 2024-08-14 06:20:28,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2024-08-14 06:20:31,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2521420.0, ans=0.125 2024-08-14 06:20:34,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2521420.0, ans=0.1 2024-08-14 06:20:41,187 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.372e+01 2.640e+01 2.859e+01 6.893e+01, threshold=5.281e+01, percent-clipped=1.0 2024-08-14 06:20:54,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2521620.0, ans=0.05 2024-08-14 06:21:03,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2521620.0, ans=0.125 2024-08-14 06:21:12,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5800, loss[loss=0.09414, beats_loss=0.01136, ecapa_loss=0.0001805, whisper_loss=0.08098, over 18167.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001585, whisper_loss=0.09125, over 3874147.19 frames. ], batch size: 76, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:21:27,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2521820.0, ans=0.1 2024-08-14 06:21:32,093 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 06:22:10,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2522020.0, ans=0.2 2024-08-14 06:22:10,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2522020.0, ans=0.0 2024-08-14 06:22:12,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2522120.0, ans=0.125 2024-08-14 06:22:26,925 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5850, loss[loss=0.1054, beats_loss=0.01333, ecapa_loss=0.0001787, whisper_loss=0.09027, over 20689.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001584, whisper_loss=0.09095, over 3878492.74 frames. ], batch size: 86, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:22:28,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2522220.0, ans=0.125 2024-08-14 06:22:38,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2522220.0, ans=0.0 2024-08-14 06:22:42,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2522320.0, ans=0.125 2024-08-14 06:23:01,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2522420.0, ans=10.0 2024-08-14 06:23:02,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2522420.0, ans=0.125 2024-08-14 06:23:06,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=22.5 2024-08-14 06:23:06,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2522420.0, ans=0.1 2024-08-14 06:23:06,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2522420.0, ans=0.2 2024-08-14 06:23:10,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.428e+01 2.673e+01 2.941e+01 3.816e+01, threshold=5.346e+01, percent-clipped=0.0 2024-08-14 06:23:24,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2522620.0, ans=0.2 2024-08-14 06:23:28,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=22.5 2024-08-14 06:23:30,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-14 06:23:38,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5900, loss[loss=0.1257, beats_loss=0.0118, ecapa_loss=0.0001825, whisper_loss=0.1121, over 21795.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0108, ecapa_loss=0.0001574, whisper_loss=0.09033, over 3895187.76 frames. ], batch size: 88, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:24:18,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2523020.0, ans=0.0 2024-08-14 06:24:47,691 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 5950, loss[loss=0.09681, beats_loss=0.007847, ecapa_loss=0.0001741, whisper_loss=0.08722, over 14327.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01083, ecapa_loss=0.0001572, whisper_loss=0.09007, over 3901955.22 frames. ], batch size: 55, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:24:58,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2024-08-14 06:25:03,054 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 06:25:19,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2523420.0, ans=0.0 2024-08-14 06:25:19,681 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:25:22,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2523420.0, ans=0.0 2024-08-14 06:25:30,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.432e+01 2.806e+01 3.149e+01 6.455e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-14 06:25:30,867 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 06:25:35,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2523520.0, ans=0.1 2024-08-14 06:25:54,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2523620.0, ans=0.125 2024-08-14 06:25:56,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6000, loss[loss=0.1129, beats_loss=0.01094, ecapa_loss=0.0001204, whisper_loss=0.1007, over 18826.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001582, whisper_loss=0.0908, over 3917477.68 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:25:56,744 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 06:26:36,853 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on ASR_libri: loss=0.2513, beats_loss=0, ecapa_loss=0.0005424, whisper_loss=0.2459, over 922467.00 frames. 2024-08-14 06:26:45,590 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2469, 2.8046, 2.1006, 1.4537], device='cuda:2') 2024-08-14 06:26:55,917 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on SV_voxceleb1: loss=0.004393, beats_loss=0, ecapa_loss=0.0004393, whisper_loss=0, over 939242.00 frames. 2024-08-14 06:28:56,676 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on AT_audioset: loss=0.02347, beats_loss=0.02347, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 06:28:56,680 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 06:29:04,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2523720.0, ans=0.125 2024-08-14 06:29:10,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2523820.0, ans=0.125 2024-08-14 06:29:17,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-14 06:30:05,671 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6050, loss[loss=0.1062, beats_loss=0.01108, ecapa_loss=0.0001544, whisper_loss=0.0936, over 21780.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001591, whisper_loss=0.09101, over 3889634.25 frames. ], batch size: 88, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:30:10,214 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 20 from LS+wenet, 21 from Vox, 53 fro AS 2024-08-14 06:30:17,049 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 06:30:38,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2524420.0, ans=0.125 2024-08-14 06:30:38,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2524420.0, ans=0.125 2024-08-14 06:30:48,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2524520.0, ans=0.2 2024-08-14 06:30:49,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.348e+01 2.542e+01 2.875e+01 5.513e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 06:31:10,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2524620.0, ans=15.0 2024-08-14 06:31:15,164 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6100, loss[loss=0.11, beats_loss=0.009267, ecapa_loss=0.0001629, whisper_loss=0.09909, over 20587.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001579, whisper_loss=0.09072, over 3928251.93 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:31:35,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2524820.0, ans=0.05 2024-08-14 06:31:41,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2524820.0, ans=0.0 2024-08-14 06:31:42,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2524920.0, ans=0.07 2024-08-14 06:32:01,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2525020.0, ans=0.2 2024-08-14 06:32:05,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2525020.0, ans=0.0 2024-08-14 06:32:08,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-08-14 06:32:13,588 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 06:32:25,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6150, loss[loss=0.08751, beats_loss=0.01221, ecapa_loss=0.0001788, whisper_loss=0.07351, over 22168.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01093, ecapa_loss=0.0001574, whisper_loss=0.09061, over 3922159.90 frames. ], batch size: 92, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:32:30,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2525220.0, ans=0.125 2024-08-14 06:32:53,415 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 06:32:59,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2525420.0, ans=0.0 2024-08-14 06:33:03,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2525420.0, ans=0.0 2024-08-14 06:33:06,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.75 vs. limit=10.0 2024-08-14 06:33:10,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.296e+01 2.588e+01 2.950e+01 9.161e+01, threshold=5.175e+01, percent-clipped=1.0 2024-08-14 06:33:37,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6200, loss[loss=0.1059, beats_loss=0.009508, ecapa_loss=0.0001561, whisper_loss=0.09484, over 18910.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01097, ecapa_loss=0.0001575, whisper_loss=0.09003, over 3887300.55 frames. ], batch size: 75, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:34:00,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2525820.0, ans=0.0 2024-08-14 06:34:07,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2525920.0, ans=0.125 2024-08-14 06:34:08,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2024-08-14 06:34:26,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2526020.0, ans=0.125 2024-08-14 06:34:33,984 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 06:34:34,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2526020.0, ans=0.0 2024-08-14 06:34:35,341 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 06:34:48,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2526120.0, ans=0.125 2024-08-14 06:34:49,538 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 06:34:54,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6250, loss[loss=0.1031, beats_loss=0.009655, ecapa_loss=0.0001829, whisper_loss=0.09161, over 16591.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01094, ecapa_loss=0.0001573, whisper_loss=0.08932, over 3889758.46 frames. ], batch size: 68, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:35:00,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2526220.0, ans=0.125 2024-08-14 06:35:44,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.485e+01 2.719e+01 3.146e+01 4.092e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-14 06:36:12,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6300, loss[loss=0.09533, beats_loss=0.009854, ecapa_loss=0.0001468, whisper_loss=0.08401, over 17107.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01091, ecapa_loss=0.000157, whisper_loss=0.08956, over 3861886.05 frames. ], batch size: 66, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:36:17,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2526720.0, ans=0.0 2024-08-14 06:36:24,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2526720.0, ans=0.1 2024-08-14 06:36:35,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2526820.0, ans=0.125 2024-08-14 06:36:35,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2526820.0, ans=0.0 2024-08-14 06:36:36,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2526820.0, ans=0.125 2024-08-14 06:36:43,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2526920.0, ans=0.1 2024-08-14 06:36:54,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2526920.0, ans=0.125 2024-08-14 06:36:56,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2526920.0, ans=0.0 2024-08-14 06:37:03,701 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 06:37:12,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2527020.0, ans=0.1 2024-08-14 06:37:21,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2527120.0, ans=0.2 2024-08-14 06:37:22,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2527120.0, ans=0.0 2024-08-14 06:37:30,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-08-14 06:37:30,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6350, loss[loss=0.08838, beats_loss=0.01214, ecapa_loss=0.0001612, whisper_loss=0.07462, over 21214.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01089, ecapa_loss=0.0001571, whisper_loss=0.09035, over 3867390.49 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:37:45,371 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 06:38:02,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2527420.0, ans=0.125 2024-08-14 06:38:04,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=15.0 2024-08-14 06:38:16,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2527520.0, ans=0.2 2024-08-14 06:38:19,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.286e+01 2.522e+01 2.892e+01 3.872e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 06:38:24,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2527520.0, ans=0.2 2024-08-14 06:38:36,299 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 06:38:47,909 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6400, loss[loss=0.09417, beats_loss=0.01237, ecapa_loss=0.0001589, whisper_loss=0.08021, over 19966.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.000157, whisper_loss=0.09069, over 3881418.61 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:38:52,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2527720.0, ans=0.0 2024-08-14 06:39:00,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2527720.0, ans=0.125 2024-08-14 06:39:00,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2527720.0, ans=0.125 2024-08-14 06:39:01,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2527720.0, ans=0.125 2024-08-14 06:39:03,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2527820.0, ans=0.125 2024-08-14 06:39:05,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-14 06:39:07,431 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 06:39:09,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2527820.0, ans=0.07 2024-08-14 06:39:15,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2527820.0, ans=0.125 2024-08-14 06:39:20,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2527920.0, ans=0.0 2024-08-14 06:39:59,511 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 06:40:00,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2528120.0, ans=0.0 2024-08-14 06:40:03,989 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 06:40:06,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6450, loss[loss=0.1029, beats_loss=0.01129, ecapa_loss=0.0001259, whisper_loss=0.09038, over 14849.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001567, whisper_loss=0.09104, over 3881007.74 frames. ], batch size: 54, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:40:11,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2528220.0, ans=0.0 2024-08-14 06:40:11,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2528220.0, ans=0.125 2024-08-14 06:40:18,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2528220.0, ans=0.125 2024-08-14 06:40:24,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2528320.0, ans=0.1 2024-08-14 06:40:56,960 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.368e+01 2.657e+01 3.046e+01 7.930e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 06:41:12,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2528620.0, ans=0.015 2024-08-14 06:41:24,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6500, loss[loss=0.1117, beats_loss=0.00864, ecapa_loss=0.0001975, whisper_loss=0.1011, over 18448.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001576, whisper_loss=0.09133, over 3898635.68 frames. ], batch size: 74, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:41:30,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2528720.0, ans=0.0 2024-08-14 06:41:32,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2528720.0, ans=0.125 2024-08-14 06:41:36,067 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 06:41:44,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-14 06:41:52,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2528820.0, ans=0.0 2024-08-14 06:42:14,027 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 06:42:17,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2529020.0, ans=0.125 2024-08-14 06:42:19,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-14 06:42:43,615 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6550, loss[loss=0.09322, beats_loss=0.01052, ecapa_loss=0.0001598, whisper_loss=0.0811, over 22062.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001561, whisper_loss=0.09151, over 3935159.78 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:42:57,365 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 06:43:17,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2529420.0, ans=0.125 2024-08-14 06:43:18,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2529420.0, ans=0.125 2024-08-14 06:43:24,881 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 06:43:35,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.448e+01 2.627e+01 2.898e+01 7.209e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-14 06:43:46,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2529520.0, ans=0.0 2024-08-14 06:43:51,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2529620.0, ans=0.125 2024-08-14 06:44:05,904 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6600, loss[loss=0.1027, beats_loss=0.01184, ecapa_loss=0.0001703, whisper_loss=0.08912, over 21035.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001573, whisper_loss=0.09077, over 3917610.16 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:44:34,239 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 19 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-14 06:44:44,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2529920.0, ans=0.0 2024-08-14 06:44:46,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2529920.0, ans=0.0 2024-08-14 06:45:15,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2530120.0, ans=15.0 2024-08-14 06:45:28,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6650, loss[loss=0.107, beats_loss=0.01253, ecapa_loss=0.0001544, whisper_loss=0.09296, over 17348.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001575, whisper_loss=0.09117, over 3902800.18 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:45:29,831 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 06:45:49,672 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-14 06:46:20,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.583e+01 2.896e+01 3.977e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 06:46:25,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2530520.0, ans=0.1 2024-08-14 06:46:43,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2530620.0, ans=0.125 2024-08-14 06:46:44,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-08-14 06:46:45,248 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 06:46:48,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6700, loss[loss=0.1005, beats_loss=0.01172, ecapa_loss=0.0001695, whisper_loss=0.08707, over 19638.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.0001573, whisper_loss=0.09008, over 3887877.92 frames. ], batch size: 80, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:47:04,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2530820.0, ans=0.125 2024-08-14 06:47:08,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2530820.0, ans=0.1 2024-08-14 06:47:12,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2530820.0, ans=0.07 2024-08-14 06:47:35,995 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 06:47:37,536 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 15 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 06:47:39,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2531020.0, ans=6.0 2024-08-14 06:48:03,054 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 06:48:06,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2531120.0, ans=0.2 2024-08-14 06:48:16,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6750, loss[loss=0.09643, beats_loss=0.007476, ecapa_loss=0.0002279, whisper_loss=0.08668, over 15054.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01085, ecapa_loss=0.0001581, whisper_loss=0.08982, over 3886594.73 frames. ], batch size: 63, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:48:17,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2531220.0, ans=0.0 2024-08-14 06:48:19,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2024-08-14 06:48:49,884 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 06:48:58,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2531420.0, ans=0.05 2024-08-14 06:49:06,095 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:49:06,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-08-14 06:49:06,966 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 06:49:08,035 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.298e+01 2.539e+01 2.885e+01 4.400e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 06:49:14,907 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 06:49:36,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2531620.0, ans=0.125 2024-08-14 06:49:39,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6800, loss[loss=0.09758, beats_loss=0.01273, ecapa_loss=0.0001679, whisper_loss=0.08316, over 14812.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01084, ecapa_loss=0.0001586, whisper_loss=0.0896, over 3868926.00 frames. ], batch size: 63, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:49:40,695 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 06:49:41,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2531720.0, ans=0.04949747468305833 2024-08-14 06:49:58,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2531820.0, ans=0.125 2024-08-14 06:50:23,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2531920.0, ans=0.125 2024-08-14 06:50:41,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2532020.0, ans=0.125 2024-08-14 06:50:43,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2532020.0, ans=0.0 2024-08-14 06:51:09,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6850, loss[loss=0.08953, beats_loss=0.0117, ecapa_loss=0.0001579, whisper_loss=0.07625, over 16358.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.000159, whisper_loss=0.08992, over 3867822.20 frames. ], batch size: 63, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:51:24,195 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 06:51:36,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2532320.0, ans=0.125 2024-08-14 06:51:36,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2532320.0, ans=0.125 2024-08-14 06:51:38,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2532320.0, ans=0.95 2024-08-14 06:51:38,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=12.0 2024-08-14 06:51:57,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2532420.0, ans=0.125 2024-08-14 06:52:01,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2532520.0, ans=0.125 2024-08-14 06:52:03,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.338e+01 2.591e+01 2.972e+01 6.425e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-14 06:52:06,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2532520.0, ans=0.1 2024-08-14 06:52:17,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2532520.0, ans=0.125 2024-08-14 06:52:35,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2532620.0, ans=0.1 2024-08-14 06:52:37,689 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-14 06:52:40,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6900, loss[loss=0.09679, beats_loss=0.0135, ecapa_loss=0.000131, whisper_loss=0.08198, over 19150.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.0905, over 3865399.30 frames. ], batch size: 76, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:52:40,751 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 06:52:54,024 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 06:53:06,737 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 06:53:37,884 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 06:53:44,970 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 06:53:49,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=22.5 2024-08-14 06:53:53,516 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 06:54:04,017 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 06:54:11,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533120.0, ans=0.1 2024-08-14 06:54:28,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-14 06:54:30,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 6950, loss[loss=0.1289, beats_loss=0.007875, ecapa_loss=0.0001752, whisper_loss=0.1193, over 16067.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.000157, whisper_loss=0.09075, over 3848348.19 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:54:57,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2533320.0, ans=0.0 2024-08-14 06:55:14,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2533420.0, ans=0.0 2024-08-14 06:55:24,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2533420.0, ans=0.125 2024-08-14 06:55:26,138 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 06:55:31,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2533420.0, ans=0.05 2024-08-14 06:55:32,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-14 06:55:35,380 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 06:55:35,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533520.0, ans=0.1 2024-08-14 06:55:41,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.379e+01 2.570e+01 2.940e+01 3.906e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 06:55:42,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2533520.0, ans=0.125 2024-08-14 06:55:51,539 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 06:55:53,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2533520.0, ans=0.1 2024-08-14 06:56:09,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2533620.0, ans=0.0 2024-08-14 06:56:18,676 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 06:56:20,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7000, loss[loss=0.09961, beats_loss=0.01123, ecapa_loss=0.0001421, whisper_loss=0.08696, over 22581.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001574, whisper_loss=0.09035, over 3870481.38 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:56:53,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-14 06:57:23,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2533920.0, ans=0.0 2024-08-14 06:57:40,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2534020.0, ans=0.0 2024-08-14 06:57:55,651 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 06:58:01,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7050, loss[loss=0.123, beats_loss=0.008735, ecapa_loss=0.0001729, whisper_loss=0.1125, over 21430.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01084, ecapa_loss=0.0001576, whisper_loss=0.08968, over 3856220.24 frames. ], batch size: 83, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:58:06,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2534220.0, ans=0.0 2024-08-14 06:58:22,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2534320.0, ans=0.1 2024-08-14 06:58:39,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2534420.0, ans=0.07 2024-08-14 06:58:47,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2534520.0, ans=0.1 2024-08-14 06:58:48,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.340e+01 2.637e+01 3.082e+01 1.011e+02, threshold=5.275e+01, percent-clipped=1.0 2024-08-14 06:58:58,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2534620.0, ans=0.2 2024-08-14 06:58:59,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=2534620.0, ans=12.0 2024-08-14 06:59:07,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2534620.0, ans=0.125 2024-08-14 06:59:11,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2534620.0, ans=0.125 2024-08-14 06:59:13,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7100, loss[loss=0.1123, beats_loss=0.009947, ecapa_loss=0.0001794, whisper_loss=0.1006, over 16769.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01084, ecapa_loss=0.0001575, whisper_loss=0.08994, over 3850952.76 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:59:19,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-14 06:59:21,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-14 06:59:58,634 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 07:00:23,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2535120.0, ans=0.125 2024-08-14 07:00:24,068 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 07:00:25,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2535120.0, ans=0.125 2024-08-14 07:00:31,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7150, loss[loss=0.0901, beats_loss=0.01114, ecapa_loss=0.0001586, whisper_loss=0.07737, over 17932.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001565, whisper_loss=0.09, over 3879620.63 frames. ], batch size: 71, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:00:35,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2535220.0, ans=0.125 2024-08-14 07:00:48,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2535320.0, ans=0.0 2024-08-14 07:01:10,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2535420.0, ans=0.125 2024-08-14 07:01:20,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.290e+01 2.562e+01 2.920e+01 7.577e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 07:01:32,309 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 07:01:32,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2535620.0, ans=0.0 2024-08-14 07:01:47,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7200, loss[loss=0.1095, beats_loss=0.009484, ecapa_loss=0.0002068, whisper_loss=0.09798, over 17117.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001576, whisper_loss=0.08993, over 3878218.98 frames. ], batch size: 73, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:01:49,085 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-14 07:02:14,163 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-14 07:02:20,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2535920.0, ans=0.0 2024-08-14 07:02:21,320 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 07:02:29,652 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 07:02:38,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2536020.0, ans=0.1 2024-08-14 07:02:50,442 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.835e+05 2024-08-14 07:03:04,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7250, loss[loss=0.09392, beats_loss=0.01203, ecapa_loss=0.0001706, whisper_loss=0.08018, over 16634.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001585, whisper_loss=0.09081, over 3898156.16 frames. ], batch size: 67, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:03:08,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-14 07:03:10,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2536220.0, ans=0.1 2024-08-14 07:03:14,743 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 07:03:23,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2536320.0, ans=0.0 2024-08-14 07:03:31,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-14 07:03:32,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2536320.0, ans=0.125 2024-08-14 07:03:55,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.431e+01 2.606e+01 2.894e+01 4.565e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-14 07:04:02,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2536520.0, ans=0.1 2024-08-14 07:04:07,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2536620.0, ans=0.0 2024-08-14 07:04:09,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2536620.0, ans=0.125 2024-08-14 07:04:11,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2536620.0, ans=0.125 2024-08-14 07:04:12,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2536620.0, ans=0.1 2024-08-14 07:04:22,893 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7300, loss[loss=0.0821, beats_loss=0.01158, ecapa_loss=0.0001506, whisper_loss=0.06901, over 19739.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.000158, whisper_loss=0.09123, over 3892380.55 frames. ], batch size: 82, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:04:33,821 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 07:04:41,285 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 07:04:56,460 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 07:05:16,103 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 07:05:18,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2537020.0, ans=0.2 2024-08-14 07:05:22,587 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 07:05:28,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2537120.0, ans=0.125 2024-08-14 07:05:33,107 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.655e-03 2024-08-14 07:05:38,181 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7350, loss[loss=0.09796, beats_loss=0.01344, ecapa_loss=0.0001127, whisper_loss=0.08339, over 23382.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001577, whisper_loss=0.09099, over 3838654.63 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:05:39,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2537220.0, ans=0.125 2024-08-14 07:05:41,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.0 2024-08-14 07:05:51,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2537220.0, ans=0.125 2024-08-14 07:05:53,855 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 07:05:54,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2537320.0, ans=0.0 2024-08-14 07:05:57,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2537320.0, ans=0.0 2024-08-14 07:06:24,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2537520.0, ans=0.1 2024-08-14 07:06:25,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2537520.0, ans=0.05 2024-08-14 07:06:26,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.465e+01 2.701e+01 2.899e+01 2.044e+02, threshold=5.402e+01, percent-clipped=2.0 2024-08-14 07:06:39,167 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 07:06:47,573 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 07:06:54,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7400, loss[loss=0.114, beats_loss=0.01051, ecapa_loss=0.0001903, whisper_loss=0.1016, over 20196.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.000158, whisper_loss=0.09041, over 3832422.49 frames. ], batch size: 87, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:07:33,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-14 07:07:38,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2537920.0, ans=0.125 2024-08-14 07:07:44,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2538020.0, ans=0.1 2024-08-14 07:08:01,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2538120.0, ans=0.125 2024-08-14 07:08:05,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2538120.0, ans=0.05 2024-08-14 07:08:08,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2538120.0, ans=0.1 2024-08-14 07:08:12,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7450, loss[loss=0.1153, beats_loss=0.01084, ecapa_loss=0.0001678, whisper_loss=0.1028, over 19863.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001587, whisper_loss=0.09076, over 3839208.16 frames. ], batch size: 80, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:08:41,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-14 07:09:05,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.665e+01 3.000e+01 5.031e+01, threshold=5.329e+01, percent-clipped=0.0 2024-08-14 07:09:16,068 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2024-08-14 07:09:20,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2538620.0, ans=0.2 2024-08-14 07:09:33,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7500, loss[loss=0.0994, beats_loss=0.01263, ecapa_loss=0.0001447, whisper_loss=0.08532, over 22620.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001577, whisper_loss=0.09126, over 3826460.40 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:09:47,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2538720.0, ans=0.125 2024-08-14 07:09:50,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2538820.0, ans=0.0 2024-08-14 07:09:53,268 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 07:10:02,066 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 07:10:03,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2538820.0, ans=0.0 2024-08-14 07:10:17,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2538920.0, ans=0.04949747468305833 2024-08-14 07:10:27,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2539020.0, ans=0.125 2024-08-14 07:10:37,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2539120.0, ans=0.125 2024-08-14 07:10:39,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2539120.0, ans=0.125 2024-08-14 07:10:42,034 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 07:10:54,718 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7550, loss[loss=0.1145, beats_loss=0.009102, ecapa_loss=0.0001751, whisper_loss=0.1036, over 18915.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001588, whisper_loss=0.0909, over 3830242.65 frames. ], batch size: 75, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:10:56,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.051e-03 2024-08-14 07:11:05,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2539220.0, ans=0.0 2024-08-14 07:11:46,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.298e+01 2.593e+01 2.946e+01 4.435e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 07:12:01,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2539620.0, ans=0.1 2024-08-14 07:12:06,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2539620.0, ans=0.125 2024-08-14 07:12:10,242 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 07:12:13,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2539620.0, ans=0.2 2024-08-14 07:12:15,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7600, loss[loss=0.1126, beats_loss=0.009298, ecapa_loss=0.0001708, whisper_loss=0.1016, over 17140.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001586, whisper_loss=0.09065, over 3850312.04 frames. ], batch size: 67, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:12:32,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2539820.0, ans=0.1 2024-08-14 07:12:39,493 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:13:13,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-08-14 07:13:14,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2540020.0, ans=0.0 2024-08-14 07:13:21,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2540120.0, ans=0.125 2024-08-14 07:13:26,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2540120.0, ans=0.125 2024-08-14 07:13:33,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7650, loss[loss=0.1186, beats_loss=0.00833, ecapa_loss=0.0001648, whisper_loss=0.1087, over 20162.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001586, whisper_loss=0.09036, over 3854486.60 frames. ], batch size: 77, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:13:39,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2540220.0, ans=0.1 2024-08-14 07:13:40,863 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 33 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 07:14:09,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2540420.0, ans=0.1 2024-08-14 07:14:14,545 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 34 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 07:14:25,150 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.299e+01 2.494e+01 2.918e+01 5.997e+01, threshold=4.989e+01, percent-clipped=1.0 2024-08-14 07:14:32,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2540520.0, ans=0.0 2024-08-14 07:14:35,841 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 07:14:53,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7700, loss[loss=0.1152, beats_loss=0.009853, ecapa_loss=0.0001497, whisper_loss=0.1039, over 23372.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.000159, whisper_loss=0.0903, over 3864547.14 frames. ], batch size: 94, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:15:03,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2540720.0, ans=0.1 2024-08-14 07:15:07,517 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-14 07:15:09,376 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 07:15:10,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2540820.0, ans=0.2 2024-08-14 07:15:30,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2540920.0, ans=0.1 2024-08-14 07:15:40,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2540920.0, ans=0.0 2024-08-14 07:15:41,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2541020.0, ans=0.0 2024-08-14 07:15:56,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-08-14 07:16:12,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2541220.0, ans=0.2 2024-08-14 07:16:12,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2541220.0, ans=0.09899494936611666 2024-08-14 07:16:13,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7750, loss[loss=0.07517, beats_loss=0.01163, ecapa_loss=0.0001478, whisper_loss=0.06207, over 22161.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001566, whisper_loss=0.09023, over 3871302.38 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:16:23,326 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 07:16:34,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2541320.0, ans=0.5 2024-08-14 07:16:37,804 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 07:16:44,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2541420.0, ans=0.2 2024-08-14 07:16:55,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2541420.0, ans=0.0 2024-08-14 07:16:57,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2541420.0, ans=0.09899494936611666 2024-08-14 07:17:04,257 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.379e+01 2.592e+01 2.812e+01 4.047e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-14 07:17:19,127 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 07:17:24,827 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 07:17:30,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7800, loss[loss=0.1279, beats_loss=0.01052, ecapa_loss=0.0001185, whisper_loss=0.1162, over 19409.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001549, whisper_loss=0.08997, over 3884968.68 frames. ], batch size: 72, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:17:30,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2541720.0, ans=0.09899494936611666 2024-08-14 07:17:38,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2541720.0, ans=0.07 2024-08-14 07:18:01,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2541920.0, ans=0.1 2024-08-14 07:18:32,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2542120.0, ans=0.1 2024-08-14 07:18:43,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7850, loss[loss=0.09232, beats_loss=0.008378, ecapa_loss=0.0002072, whisper_loss=0.08187, over 17045.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001547, whisper_loss=0.09042, over 3885345.35 frames. ], batch size: 67, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:18:48,336 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-14 07:18:55,377 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 07:19:01,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2542320.0, ans=0.125 2024-08-14 07:19:28,105 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 07:19:28,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2542520.0, ans=0.2 2024-08-14 07:19:29,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.387e+01 2.602e+01 2.902e+01 1.105e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 07:19:29,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2542520.0, ans=0.125 2024-08-14 07:19:43,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2542620.0, ans=0.2 2024-08-14 07:19:54,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7900, loss[loss=0.08696, beats_loss=0.01183, ecapa_loss=0.0001392, whisper_loss=0.07374, over 15047.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.000155, whisper_loss=0.09056, over 3885048.29 frames. ], batch size: 58, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:20:00,078 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 07:20:06,098 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 07:20:06,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-14 07:20:15,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2542820.0, ans=0.0 2024-08-14 07:20:26,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2542920.0, ans=0.2 2024-08-14 07:20:28,857 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04889056831598282, model_norm_threshold=52.03104019165039 2024-08-14 07:20:29,027 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.668e+05, grad_sumsq=1.668e+05, orig_rms_sq=1.000e+00 2024-08-14 07:20:32,213 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 07:20:39,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-14 07:20:43,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2543020.0, ans=0.0 2024-08-14 07:20:49,897 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 32 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 07:21:01,646 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.798e+01 2024-08-14 07:21:06,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 7950, loss[loss=0.112, beats_loss=0.007961, ecapa_loss=0.0002093, whisper_loss=0.1019, over 21397.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01089, ecapa_loss=0.000156, whisper_loss=0.09034, over 3901237.73 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:21:10,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2543220.0, ans=0.2 2024-08-14 07:21:13,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-14 07:21:14,623 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.975e-02 2024-08-14 07:21:22,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2543320.0, ans=0.125 2024-08-14 07:21:24,472 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-14 07:21:26,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2024-08-14 07:21:30,483 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 07:21:38,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2543420.0, ans=0.2 2024-08-14 07:21:44,062 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-14 07:21:54,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.373e+01 2.668e+01 3.252e+01 1.064e+03, threshold=5.336e+01, percent-clipped=2.0 2024-08-14 07:21:57,086 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 07:21:58,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2543520.0, ans=0.125 2024-08-14 07:22:02,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2543520.0, ans=0.0 2024-08-14 07:22:08,550 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 07:22:14,167 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 07:22:19,735 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8000, loss[loss=0.1131, beats_loss=0.01013, ecapa_loss=0.0001533, whisper_loss=0.1014, over 22890.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01083, ecapa_loss=0.0001563, whisper_loss=0.09057, over 3917963.11 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:22:39,477 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 30 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 07:22:42,278 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 07:22:54,086 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 07:23:18,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2544120.0, ans=0.125 2024-08-14 07:23:29,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2544120.0, ans=0.125 2024-08-14 07:23:33,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8050, loss[loss=0.09925, beats_loss=0.009643, ecapa_loss=0.000159, whisper_loss=0.08802, over 15596.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01086, ecapa_loss=0.0001558, whisper_loss=0.09036, over 3893920.15 frames. ], batch size: 60, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:23:51,360 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 07:24:01,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2544420.0, ans=0.125 2024-08-14 07:24:11,553 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-14 07:24:20,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.420e+01 2.579e+01 3.062e+01 1.369e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-14 07:24:45,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8100, loss[loss=0.08821, beats_loss=0.01294, ecapa_loss=0.0001411, whisper_loss=0.07386, over 22781.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001557, whisper_loss=0.09059, over 3887080.55 frames. ], batch size: 94, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:24:47,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2544720.0, ans=0.09899494936611666 2024-08-14 07:24:56,159 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.306e-02 2024-08-14 07:25:01,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2544820.0, ans=0.95 2024-08-14 07:25:02,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-08-14 07:25:12,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2544820.0, ans=0.1 2024-08-14 07:25:27,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-14 07:25:38,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2545020.0, ans=0.2 2024-08-14 07:25:38,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-14 07:25:52,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2545120.0, ans=0.1 2024-08-14 07:25:55,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8150, loss[loss=0.1128, beats_loss=0.01047, ecapa_loss=0.0001474, whisper_loss=0.1009, over 22345.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001565, whisper_loss=0.09005, over 3853558.65 frames. ], batch size: 88, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:26:09,951 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 07:26:16,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2545320.0, ans=0.125 2024-08-14 07:26:39,844 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 07:26:40,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2545520.0, ans=0.0 2024-08-14 07:26:41,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.412e+01 2.672e+01 3.051e+01 4.273e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 07:26:43,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2545520.0, ans=0.125 2024-08-14 07:26:43,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2545520.0, ans=0.125 2024-08-14 07:27:04,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-14 07:27:05,463 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 07:27:06,481 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8200, loss[loss=0.1082, beats_loss=0.01053, ecapa_loss=0.0001783, whisper_loss=0.09591, over 21462.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001572, whisper_loss=0.09048, over 3877953.24 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:27:19,953 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:27:20,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2545820.0, ans=0.125 2024-08-14 07:27:50,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-14 07:28:01,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2546020.0, ans=0.125 2024-08-14 07:28:02,767 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 07:28:12,690 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 07:28:13,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2546120.0, ans=0.125 2024-08-14 07:28:18,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8250, loss[loss=0.1229, beats_loss=0.01094, ecapa_loss=0.0001475, whisper_loss=0.1105, over 23793.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.0001575, whisper_loss=0.09052, over 3875614.53 frames. ], batch size: 94, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:28:29,462 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 07:28:36,619 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 07:28:54,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2546420.0, ans=0.125 2024-08-14 07:28:54,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=12.0 2024-08-14 07:29:01,026 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 07:29:03,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.321e+01 2.631e+01 2.915e+01 1.588e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-14 07:29:10,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2546520.0, ans=0.2 2024-08-14 07:29:30,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-08-14 07:29:32,256 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8300, loss[loss=0.1175, beats_loss=0.01161, ecapa_loss=0.0001672, whisper_loss=0.1042, over 22806.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001569, whisper_loss=0.09042, over 3887238.73 frames. ], batch size: 93, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:29:37,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2546720.0, ans=0.125 2024-08-14 07:30:04,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-14 07:30:27,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2547020.0, ans=0.125 2024-08-14 07:30:34,539 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-14 07:30:48,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8350, loss[loss=0.1094, beats_loss=0.009228, ecapa_loss=0.0001554, whisper_loss=0.09863, over 17932.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001568, whisper_loss=0.09089, over 3891335.73 frames. ], batch size: 71, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:31:00,747 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-14 07:31:02,664 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 07:31:14,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2547320.0, ans=0.0 2024-08-14 07:31:17,972 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 07:31:29,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2547420.0, ans=0.2 2024-08-14 07:31:35,334 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.329e+01 2.543e+01 2.806e+01 3.860e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-14 07:31:35,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2547520.0, ans=0.1 2024-08-14 07:31:54,676 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 07:31:55,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2547620.0, ans=0.125 2024-08-14 07:32:01,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8400, loss[loss=0.1194, beats_loss=0.009172, ecapa_loss=0.0001222, whisper_loss=0.109, over 22566.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.0001571, whisper_loss=0.09162, over 3892989.28 frames. ], batch size: 82, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:32:26,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2024-08-14 07:32:37,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2547920.0, ans=0.1 2024-08-14 07:32:44,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2548020.0, ans=0.0 2024-08-14 07:32:50,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2548020.0, ans=0.0 2024-08-14 07:32:51,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-14 07:33:01,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2548120.0, ans=0.2 2024-08-14 07:33:08,545 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 07:33:13,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8450, loss[loss=0.1013, beats_loss=0.01003, ecapa_loss=0.0001793, whisper_loss=0.08946, over 18790.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01065, ecapa_loss=0.000158, whisper_loss=0.09164, over 3864008.59 frames. ], batch size: 80, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:33:19,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=12.0 2024-08-14 07:33:26,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2548220.0, ans=0.2 2024-08-14 07:33:28,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2548320.0, ans=0.1 2024-08-14 07:33:39,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2548320.0, ans=0.125 2024-08-14 07:33:47,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2548420.0, ans=0.0 2024-08-14 07:33:48,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2024-08-14 07:33:52,829 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 07:33:59,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.373e+01 2.582e+01 3.046e+01 4.610e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 07:33:59,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2548520.0, ans=0.0 2024-08-14 07:34:06,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2548520.0, ans=0.125 2024-08-14 07:34:13,302 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 07:34:25,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8500, loss[loss=0.09894, beats_loss=0.007557, ecapa_loss=0.000173, whisper_loss=0.08965, over 14169.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001589, whisper_loss=0.09106, over 3867898.53 frames. ], batch size: 55, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:34:30,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2548720.0, ans=0.125 2024-08-14 07:34:31,662 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 33 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 07:34:31,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2548720.0, ans=0.0 2024-08-14 07:34:33,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2024-08-14 07:34:44,182 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 15 from Vox, 58 fro AS 2024-08-14 07:34:58,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2548920.0, ans=0.125 2024-08-14 07:34:59,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2024-08-14 07:35:13,439 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 07:35:16,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2549020.0, ans=0.0 2024-08-14 07:35:19,186 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 07:35:35,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2549220.0, ans=0.1 2024-08-14 07:35:36,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8550, loss[loss=0.09849, beats_loss=0.01132, ecapa_loss=0.000144, whisper_loss=0.08573, over 21780.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.000159, whisper_loss=0.09102, over 3862957.11 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:35:37,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2549220.0, ans=0.1 2024-08-14 07:35:41,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2549220.0, ans=0.125 2024-08-14 07:36:02,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2549320.0, ans=0.125 2024-08-14 07:36:21,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2549520.0, ans=0.125 2024-08-14 07:36:22,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.413e+01 2.608e+01 2.932e+01 1.178e+02, threshold=5.217e+01, percent-clipped=2.0 2024-08-14 07:36:27,405 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 07:36:39,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-08-14 07:36:44,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2549620.0, ans=0.125 2024-08-14 07:36:48,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2549620.0, ans=0.125 2024-08-14 07:36:49,375 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 07:36:50,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8600, loss[loss=0.09911, beats_loss=0.01084, ecapa_loss=0.0001531, whisper_loss=0.08674, over 14151.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01064, ecapa_loss=0.0001591, whisper_loss=0.0919, over 3866556.01 frames. ], batch size: 55, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:37:00,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2549720.0, ans=0.0 2024-08-14 07:37:15,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2549820.0, ans=0.125 2024-08-14 07:37:22,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2024-08-14 07:37:28,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2549920.0, ans=0.0 2024-08-14 07:37:32,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2549920.0, ans=0.125 2024-08-14 07:37:35,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2549920.0, ans=0.125 2024-08-14 07:37:59,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2550120.0, ans=0.125 2024-08-14 07:38:09,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8650, loss[loss=0.1155, beats_loss=0.007006, ecapa_loss=0.0001763, whisper_loss=0.1067, over 16307.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001604, whisper_loss=0.09173, over 3852901.31 frames. ], batch size: 63, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:38:26,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2550320.0, ans=0.125 2024-08-14 07:38:27,256 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 07:38:48,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2550420.0, ans=0.125 2024-08-14 07:38:56,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.324e+01 2.530e+01 2.821e+01 3.799e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 07:39:14,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2550620.0, ans=0.0 2024-08-14 07:39:17,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=12.0 2024-08-14 07:39:19,751 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 07:39:20,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8700, loss[loss=0.1016, beats_loss=0.01004, ecapa_loss=0.0001884, whisper_loss=0.08964, over 18618.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.09164, over 3856888.14 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:39:22,404 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 07:39:29,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2550720.0, ans=0.0 2024-08-14 07:39:35,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2550820.0, ans=0.125 2024-08-14 07:39:42,759 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 07:39:56,537 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 07:40:00,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2550920.0, ans=0.125 2024-08-14 07:40:02,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=10.0 2024-08-14 07:40:25,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2551120.0, ans=0.125 2024-08-14 07:40:31,590 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8750, loss[loss=0.128, beats_loss=0.009127, ecapa_loss=0.0001581, whisper_loss=0.1173, over 16761.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001576, whisper_loss=0.09131, over 3852955.06 frames. ], batch size: 66, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:40:48,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2551320.0, ans=15.0 2024-08-14 07:40:51,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2551320.0, ans=0.1 2024-08-14 07:41:03,340 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 07:41:05,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2551420.0, ans=0.125 2024-08-14 07:41:10,146 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 07:41:16,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2551520.0, ans=0.125 2024-08-14 07:41:16,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.282e+01 2.583e+01 2.856e+01 3.464e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 07:41:29,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2551620.0, ans=0.125 2024-08-14 07:41:42,165 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8800, loss[loss=0.1007, beats_loss=0.01147, ecapa_loss=0.0001478, whisper_loss=0.08771, over 17059.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001588, whisper_loss=0.09071, over 3855320.92 frames. ], batch size: 69, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:42:14,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2551920.0, ans=0.0 2024-08-14 07:42:23,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=15.0 2024-08-14 07:42:35,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2552020.0, ans=6.0 2024-08-14 07:42:37,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2552020.0, ans=0.5 2024-08-14 07:42:54,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8850, loss[loss=0.1156, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.1038, over 20764.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01078, ecapa_loss=0.000158, whisper_loss=0.08936, over 3831348.41 frames. ], batch size: 81, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:43:02,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2552220.0, ans=0.125 2024-08-14 07:43:14,332 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 07:43:16,952 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 07:43:21,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2552420.0, ans=0.1 2024-08-14 07:43:29,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=12.0 2024-08-14 07:43:39,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.372e+01 2.652e+01 3.112e+01 4.829e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-14 07:43:40,075 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:43:58,207 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 07:44:05,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8900, loss[loss=0.1307, beats_loss=0.008591, ecapa_loss=0.0001351, whisper_loss=0.1207, over 17743.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01081, ecapa_loss=0.0001573, whisper_loss=0.08995, over 3860074.76 frames. ], batch size: 64, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:44:08,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2552720.0, ans=0.2 2024-08-14 07:44:17,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2552720.0, ans=0.125 2024-08-14 07:44:19,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2552820.0, ans=0.2 2024-08-14 07:44:24,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2552820.0, ans=0.125 2024-08-14 07:44:29,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2552820.0, ans=0.125 2024-08-14 07:44:41,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2552920.0, ans=0.1 2024-08-14 07:44:45,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2552920.0, ans=0.125 2024-08-14 07:44:46,668 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 07:44:46,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2552920.0, ans=0.125 2024-08-14 07:44:49,300 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 07:44:56,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2553020.0, ans=0.2 2024-08-14 07:44:58,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2553020.0, ans=0.125 2024-08-14 07:45:00,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2024-08-14 07:45:02,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2553120.0, ans=0.125 2024-08-14 07:45:05,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2553120.0, ans=10.0 2024-08-14 07:45:07,510 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 07:45:08,982 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 07:45:09,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2553120.0, ans=0.125 2024-08-14 07:45:15,767 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 07:45:16,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 8950, loss[loss=0.1103, beats_loss=0.01001, ecapa_loss=0.0001728, whisper_loss=0.0986, over 18283.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01085, ecapa_loss=0.0001569, whisper_loss=0.08988, over 3857560.86 frames. ], batch size: 76, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:45:17,047 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 07:45:24,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2553220.0, ans=0.2 2024-08-14 07:45:56,901 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 07:46:01,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2553520.0, ans=0.0 2024-08-14 07:46:03,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.450e+01 2.761e+01 3.148e+01 4.518e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-14 07:46:04,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2553520.0, ans=0.05 2024-08-14 07:46:13,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2553620.0, ans=6.0 2024-08-14 07:46:27,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2553720.0, ans=0.125 2024-08-14 07:46:27,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9000, loss[loss=0.09021, beats_loss=0.01172, ecapa_loss=0.0001837, whisper_loss=0.07665, over 17837.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001581, whisper_loss=0.09009, over 3835516.25 frames. ], batch size: 75, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:46:27,838 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 07:47:08,567 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005502, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 07:47:28,555 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-14 07:49:28,286 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 07:49:28,290 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 07:49:30,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2553720.0, ans=0.1 2024-08-14 07:49:30,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2553720.0, ans=0.1 2024-08-14 07:49:34,064 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 07:49:47,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2553820.0, ans=0.1 2024-08-14 07:50:14,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2554020.0, ans=0.04949747468305833 2024-08-14 07:50:18,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2554020.0, ans=0.1 2024-08-14 07:50:21,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-14 07:50:31,694 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 07:50:38,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9050, loss[loss=0.1083, beats_loss=0.01191, ecapa_loss=0.0001513, whisper_loss=0.09492, over 15450.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01086, ecapa_loss=0.0001568, whisper_loss=0.09045, over 3852306.07 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:50:49,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2554220.0, ans=0.125 2024-08-14 07:51:11,011 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 07:51:17,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2554420.0, ans=0.125 2024-08-14 07:51:27,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.420e+01 2.680e+01 3.001e+01 5.357e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-14 07:51:37,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2554620.0, ans=0.125 2024-08-14 07:51:41,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2554620.0, ans=0.1 2024-08-14 07:51:55,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2554720.0, ans=0.95 2024-08-14 07:51:55,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9100, loss[loss=0.1144, beats_loss=0.0108, ecapa_loss=0.000132, whisper_loss=0.1023, over 25005.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01085, ecapa_loss=0.0001572, whisper_loss=0.09061, over 3861416.44 frames. ], batch size: 94, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:51:58,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2554720.0, ans=0.125 2024-08-14 07:52:02,256 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 07:52:41,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2555020.0, ans=0.1 2024-08-14 07:53:07,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2555120.0, ans=0.2 2024-08-14 07:53:11,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9150, loss[loss=0.06835, beats_loss=0.0124, ecapa_loss=0.0001509, whisper_loss=0.05445, over 16032.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01086, ecapa_loss=0.0001572, whisper_loss=0.09077, over 3871750.46 frames. ], batch size: 66, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:53:34,647 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 07:53:47,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2555420.0, ans=0.2 2024-08-14 07:53:49,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2555420.0, ans=0.0 2024-08-14 07:53:58,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.410e+01 2.700e+01 3.056e+01 6.075e+01, threshold=5.399e+01, percent-clipped=3.0 2024-08-14 07:53:59,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2555520.0, ans=0.125 2024-08-14 07:54:21,933 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9200, loss[loss=0.07672, beats_loss=0.01238, ecapa_loss=0.0001358, whisper_loss=0.06298, over 19808.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0109, ecapa_loss=0.0001568, whisper_loss=0.09051, over 3885744.85 frames. ], batch size: 79, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:54:39,459 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 07:54:53,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2555920.0, ans=0.125 2024-08-14 07:55:04,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2024-08-14 07:55:26,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 07:55:33,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9250, loss[loss=0.1032, beats_loss=0.009026, ecapa_loss=0.0001683, whisper_loss=0.09246, over 22236.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001571, whisper_loss=0.09079, over 3899077.79 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:55:42,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2556220.0, ans=0.125 2024-08-14 07:55:45,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.35 vs. limit=22.5 2024-08-14 07:55:48,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2556320.0, ans=0.2 2024-08-14 07:55:57,820 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 07:56:12,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-14 07:56:20,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.326e+01 2.739e+01 3.130e+01 4.617e+01, threshold=5.478e+01, percent-clipped=0.0 2024-08-14 07:56:43,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9300, loss[loss=0.07414, beats_loss=0.01115, ecapa_loss=0.0001674, whisper_loss=0.06132, over 18079.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001569, whisper_loss=0.09073, over 3933957.70 frames. ], batch size: 75, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:56:44,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2556720.0, ans=0.09899494936611666 2024-08-14 07:56:53,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2556720.0, ans=0.0 2024-08-14 07:56:57,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2556820.0, ans=0.125 2024-08-14 07:57:07,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=42.77 vs. limit=22.5 2024-08-14 07:57:36,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2557020.0, ans=0.2 2024-08-14 07:57:37,697 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 07:57:45,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2557120.0, ans=0.125 2024-08-14 07:57:46,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2557120.0, ans=0.125 2024-08-14 07:57:55,096 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 07:57:56,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9350, loss[loss=0.09852, beats_loss=0.01187, ecapa_loss=0.0001322, whisper_loss=0.08533, over 18175.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001568, whisper_loss=0.09056, over 3881336.71 frames. ], batch size: 71, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:58:15,126 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 07:58:15,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2557320.0, ans=0.125 2024-08-14 07:58:26,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2557420.0, ans=0.125 2024-08-14 07:58:34,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-14 07:58:42,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.320e+01 2.600e+01 2.954e+01 6.976e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-14 07:58:47,331 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 07:58:48,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2557520.0, ans=0.125 2024-08-14 07:58:56,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2557620.0, ans=0.125 2024-08-14 07:59:01,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2557620.0, ans=0.2 2024-08-14 07:59:01,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2557620.0, ans=0.125 2024-08-14 07:59:06,883 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9400, loss[loss=0.09305, beats_loss=0.01098, ecapa_loss=0.0001737, whisper_loss=0.08032, over 18265.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001567, whisper_loss=0.09075, over 3879975.71 frames. ], batch size: 76, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:59:07,125 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 07:59:14,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.61 vs. limit=10.0 2024-08-14 07:59:16,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2557720.0, ans=0.0 2024-08-14 07:59:31,498 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 07:59:34,393 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 07:59:38,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2557920.0, ans=0.125 2024-08-14 07:59:45,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2557920.0, ans=0.125 2024-08-14 08:00:06,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2558120.0, ans=0.1 2024-08-14 08:00:10,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2558120.0, ans=0.125 2024-08-14 08:00:17,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9450, loss[loss=0.1032, beats_loss=0.009152, ecapa_loss=0.0001749, whisper_loss=0.09233, over 15112.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001563, whisper_loss=0.09138, over 3865598.65 frames. ], batch size: 61, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:00:30,509 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 15 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 08:01:04,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.392e+01 2.683e+01 2.998e+01 2.159e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-14 08:01:18,949 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 08:01:20,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2558620.0, ans=0.0 2024-08-14 08:01:28,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9500, loss[loss=0.1233, beats_loss=0.009103, ecapa_loss=0.0001597, whisper_loss=0.1126, over 20804.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001575, whisper_loss=0.09107, over 3872687.22 frames. ], batch size: 77, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:01:40,168 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 08:02:09,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2558920.0, ans=10.0 2024-08-14 08:02:10,176 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 08:02:37,095 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 08:02:39,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9550, loss[loss=0.1111, beats_loss=0.008905, ecapa_loss=0.0001539, whisper_loss=0.1007, over 14236.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.000159, whisper_loss=0.09112, over 3861477.18 frames. ], batch size: 54, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:03:12,474 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-14 08:03:14,189 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 08:03:18,207 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 08:03:26,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.380e+01 2.664e+01 3.149e+01 1.810e+02, threshold=5.328e+01, percent-clipped=2.0 2024-08-14 08:03:44,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2559620.0, ans=0.0 2024-08-14 08:03:46,366 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 08:03:50,384 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9600, loss[loss=0.1051, beats_loss=0.01051, ecapa_loss=0.0001715, whisper_loss=0.09286, over 20679.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001584, whisper_loss=0.09062, over 3861776.62 frames. ], batch size: 85, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:04:21,325 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 08:04:40,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2560020.0, ans=0.0 2024-08-14 08:04:43,574 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 08:04:51,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2560120.0, ans=0.2 2024-08-14 08:04:56,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2560120.0, ans=0.1 2024-08-14 08:05:06,096 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9650, loss[loss=0.1086, beats_loss=0.009599, ecapa_loss=0.0001775, whisper_loss=0.0972, over 22411.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001593, whisper_loss=0.09071, over 3839791.16 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:05:07,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2560220.0, ans=0.2 2024-08-14 08:05:16,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2560220.0, ans=0.1 2024-08-14 08:05:20,512 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 08:05:26,448 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 08:05:40,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:42,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:43,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2560420.0, ans=0.125 2024-08-14 08:05:45,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2560420.0, ans=0.0 2024-08-14 08:05:52,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.378e+01 2.605e+01 3.057e+01 7.649e+01, threshold=5.209e+01, percent-clipped=3.0 2024-08-14 08:06:16,728 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9700, loss[loss=0.08437, beats_loss=0.01138, ecapa_loss=0.0001623, whisper_loss=0.07137, over 18083.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001613, whisper_loss=0.09034, over 3837382.85 frames. ], batch size: 73, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:06:44,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2560920.0, ans=0.025 2024-08-14 08:06:46,947 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 08:06:57,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2560920.0, ans=0.125 2024-08-14 08:07:00,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2561020.0, ans=0.125 2024-08-14 08:07:06,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2561020.0, ans=0.0 2024-08-14 08:07:28,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9750, loss[loss=0.1134, beats_loss=0.01184, ecapa_loss=0.0001292, whisper_loss=0.1003, over 23480.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001591, whisper_loss=0.09017, over 3868211.80 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:07:34,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2561220.0, ans=0.0 2024-08-14 08:07:48,916 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 08:08:05,056 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 08:08:16,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.202e+01 2.404e+01 2.626e+01 3.852e+01, threshold=4.808e+01, percent-clipped=0.0 2024-08-14 08:08:17,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2561520.0, ans=0.1 2024-08-14 08:08:25,337 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.216e+01 2024-08-14 08:08:26,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2561620.0, ans=0.0 2024-08-14 08:08:34,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-08-14 08:08:39,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2024-08-14 08:08:40,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9800, loss[loss=0.09878, beats_loss=0.00974, ecapa_loss=0.0001312, whisper_loss=0.08772, over 23819.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001583, whisper_loss=0.08986, over 3877681.13 frames. ], batch size: 92, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:09:07,708 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-14 08:09:16,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2024-08-14 08:09:28,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2562020.0, ans=0.0 2024-08-14 08:09:34,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2562020.0, ans=0.125 2024-08-14 08:09:36,951 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 08:09:40,779 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-14 08:09:50,582 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9850, loss[loss=0.1136, beats_loss=0.008448, ecapa_loss=0.000212, whisper_loss=0.103, over 21795.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001583, whisper_loss=0.09093, over 3895057.85 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:09:53,648 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 08:09:53,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2562220.0, ans=0.0 2024-08-14 08:10:05,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2562320.0, ans=0.0 2024-08-14 08:10:31,564 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 08:10:36,601 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.409e+01 2.672e+01 2.970e+01 5.427e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 08:10:36,915 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 08:11:00,332 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9900, loss[loss=0.09678, beats_loss=0.01012, ecapa_loss=0.0002241, whisper_loss=0.08443, over 21438.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001587, whisper_loss=0.0909, over 3895670.25 frames. ], batch size: 94, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:11:03,618 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 08:11:05,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2562720.0, ans=0.0 2024-08-14 08:11:09,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2562720.0, ans=0.05 2024-08-14 08:11:12,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-14 08:11:18,006 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:11:20,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2562820.0, ans=0.125 2024-08-14 08:11:30,470 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 08:11:35,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=12.0 2024-08-14 08:11:49,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2563020.0, ans=0.125 2024-08-14 08:11:51,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.48 vs. limit=10.0 2024-08-14 08:11:59,199 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 08:12:01,764 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 08:12:11,598 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 9950, loss[loss=0.08847, beats_loss=0.01105, ecapa_loss=0.0001508, whisper_loss=0.07592, over 18704.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001584, whisper_loss=0.09058, over 3879719.86 frames. ], batch size: 74, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:12:15,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2024-08-14 08:12:30,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2563320.0, ans=0.0 2024-08-14 08:12:33,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2024-08-14 08:12:38,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2563420.0, ans=0.2 2024-08-14 08:12:54,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2563520.0, ans=0.125 2024-08-14 08:12:58,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.516e+01 2.952e+01 4.420e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 08:13:01,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2563520.0, ans=0.125 2024-08-14 08:13:07,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2563620.0, ans=0.0 2024-08-14 08:13:17,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-14 08:13:21,260 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 08:13:22,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10000, loss[loss=0.07453, beats_loss=0.01143, ecapa_loss=0.0001456, whisper_loss=0.06165, over 16488.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001577, whisper_loss=0.09047, over 3849218.55 frames. ], batch size: 65, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:13:41,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2563820.0, ans=0.0 2024-08-14 08:13:45,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2563820.0, ans=0.04949747468305833 2024-08-14 08:13:54,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2024-08-14 08:13:57,853 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 08:13:58,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2563920.0, ans=0.125 2024-08-14 08:14:21,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-14 08:14:22,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2564120.0, ans=0.125 2024-08-14 08:14:33,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10050, loss[loss=0.08876, beats_loss=0.01093, ecapa_loss=0.0001408, whisper_loss=0.07643, over 17031.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001574, whisper_loss=0.09053, over 3823462.09 frames. ], batch size: 67, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:14:41,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-14 08:15:00,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2564320.0, ans=0.125 2024-08-14 08:15:00,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564320.0, ans=0.1 2024-08-14 08:15:19,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-14 08:15:22,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.332e+01 2.576e+01 2.987e+01 4.902e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 08:15:41,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2564620.0, ans=0.05 2024-08-14 08:15:41,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2564620.0, ans=0.2 2024-08-14 08:15:43,055 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:15:45,881 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 08:15:47,026 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10100, loss[loss=0.09134, beats_loss=0.0132, ecapa_loss=0.0001315, whisper_loss=0.07682, over 23317.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001578, whisper_loss=0.09019, over 3861820.22 frames. ], batch size: 94, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:15:56,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2564720.0, ans=0.125 2024-08-14 08:16:00,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2564720.0, ans=0.125 2024-08-14 08:16:48,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2565020.0, ans=0.125 2024-08-14 08:16:54,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2024-08-14 08:17:12,501 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10150, loss[loss=0.1167, beats_loss=0.009898, ecapa_loss=0.0001332, whisper_loss=0.1055, over 23452.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001568, whisper_loss=0.08967, over 3863498.48 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:17:20,225 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 08:17:23,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2565220.0, ans=0.0 2024-08-14 08:17:46,720 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 08:17:48,993 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.655e-03 2024-08-14 08:18:07,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2565520.0, ans=0.125 2024-08-14 08:18:08,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.364e+01 2.632e+01 2.890e+01 4.484e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-14 08:18:09,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2565520.0, ans=0.125 2024-08-14 08:18:19,900 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 08:18:26,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2565620.0, ans=0.09899494936611666 2024-08-14 08:18:27,197 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 08:18:28,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2565620.0, ans=0.0 2024-08-14 08:18:36,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10200, loss[loss=0.1058, beats_loss=0.01089, ecapa_loss=0.0001562, whisper_loss=0.0933, over 22746.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01072, ecapa_loss=0.0001565, whisper_loss=0.08969, over 3861247.37 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:19:17,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.41 vs. limit=22.5 2024-08-14 08:19:20,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2565920.0, ans=0.125 2024-08-14 08:19:21,358 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 08:19:27,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2565920.0, ans=0.125 2024-08-14 08:19:28,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2566020.0, ans=0.2 2024-08-14 08:19:30,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2024-08-14 08:19:34,323 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-14 08:19:54,268 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 08:20:04,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10250, loss[loss=0.09683, beats_loss=0.009758, ecapa_loss=0.0001919, whisper_loss=0.08516, over 17610.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001569, whisper_loss=0.09088, over 3874492.32 frames. ], batch size: 73, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:20:05,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2566220.0, ans=0.125 2024-08-14 08:20:08,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2566220.0, ans=0.09899494936611666 2024-08-14 08:20:14,730 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 08:20:21,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2566320.0, ans=0.0 2024-08-14 08:20:41,575 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 08:20:43,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2566420.0, ans=0.125 2024-08-14 08:20:44,917 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 08:20:52,571 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-14 08:20:54,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2566520.0, ans=0.0 2024-08-14 08:20:57,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.331e+01 2.528e+01 2.980e+01 4.721e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-14 08:21:19,344 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 08:21:26,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10300, loss[loss=0.08256, beats_loss=0.01346, ecapa_loss=0.0001897, whisper_loss=0.06721, over 15161.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001572, whisper_loss=0.09096, over 3896708.20 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:21:48,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2566820.0, ans=0.2 2024-08-14 08:21:49,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2566820.0, ans=0.1 2024-08-14 08:22:22,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2567020.0, ans=0.125 2024-08-14 08:22:29,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2567020.0, ans=0.1 2024-08-14 08:22:36,393 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 08:22:36,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2567120.0, ans=0.5 2024-08-14 08:22:43,191 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 08:22:51,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10350, loss[loss=0.124, beats_loss=0.008318, ecapa_loss=0.0001283, whisper_loss=0.1144, over 18260.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001572, whisper_loss=0.09052, over 3910362.68 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:23:08,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2567320.0, ans=0.2 2024-08-14 08:23:16,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2567320.0, ans=0.1 2024-08-14 08:23:17,395 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 08:23:51,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.354e+01 2.584e+01 2.935e+01 4.636e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 08:24:00,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2567520.0, ans=0.0 2024-08-14 08:24:22,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10400, loss[loss=0.07881, beats_loss=0.01411, ecapa_loss=0.0001411, whisper_loss=0.06329, over 13620.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001553, whisper_loss=0.09029, over 3900653.84 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:24:29,538 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 08:24:30,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2567720.0, ans=0.125 2024-08-14 08:24:32,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2567720.0, ans=0.0 2024-08-14 08:24:39,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=12.0 2024-08-14 08:24:41,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2567820.0, ans=0.125 2024-08-14 08:25:24,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2568020.0, ans=0.1 2024-08-14 08:25:37,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2568120.0, ans=0.025 2024-08-14 08:25:49,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10450, loss[loss=0.1063, beats_loss=0.01281, ecapa_loss=0.0001331, whisper_loss=0.09214, over 23115.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001557, whisper_loss=0.09113, over 3911875.49 frames. ], batch size: 93, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:25:51,953 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 08:26:09,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2568320.0, ans=0.0 2024-08-14 08:26:12,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2024-08-14 08:26:25,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2568420.0, ans=0.125 2024-08-14 08:26:26,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.52 vs. limit=22.5 2024-08-14 08:26:41,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.325e+01 2.620e+01 2.982e+01 4.539e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-14 08:27:05,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2568620.0, ans=0.0 2024-08-14 08:27:07,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2568720.0, ans=0.0 2024-08-14 08:27:08,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10500, loss[loss=0.1116, beats_loss=0.01394, ecapa_loss=0.0001537, whisper_loss=0.09616, over 17602.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001569, whisper_loss=0.09121, over 3867506.37 frames. ], batch size: 69, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:27:12,553 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 08:27:18,517 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 08:27:31,897 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 08:28:04,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2569020.0, ans=0.125 2024-08-14 08:28:14,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-14 08:28:25,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2569120.0, ans=0.125 2024-08-14 08:28:41,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2024-08-14 08:28:41,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10550, loss[loss=0.1097, beats_loss=0.009681, ecapa_loss=0.0001484, whisper_loss=0.09858, over 17595.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001578, whisper_loss=0.09055, over 3867808.86 frames. ], batch size: 68, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:28:48,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2569220.0, ans=0.0 2024-08-14 08:28:50,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 08:28:59,330 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 08:29:01,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2569320.0, ans=0.125 2024-08-14 08:29:03,708 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.401e+00 2024-08-14 08:29:17,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2569420.0, ans=0.125 2024-08-14 08:29:33,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2569420.0, ans=0.125 2024-08-14 08:29:38,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-08-14 08:29:39,226 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 08:29:41,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.277e+01 2.540e+01 2.895e+01 1.094e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-14 08:29:47,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2569520.0, ans=0.125 2024-08-14 08:29:48,597 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 08:29:48,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2569520.0, ans=0.125 2024-08-14 08:29:48,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2569520.0, ans=0.125 2024-08-14 08:29:52,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 08:29:56,322 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 08:30:06,839 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10600, loss[loss=0.118, beats_loss=0.008638, ecapa_loss=0.0001809, whisper_loss=0.1076, over 18662.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001575, whisper_loss=0.09036, over 3880987.81 frames. ], batch size: 73, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:30:13,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2569720.0, ans=0.125 2024-08-14 08:30:22,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2569820.0, ans=0.125 2024-08-14 08:30:37,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2024-08-14 08:30:41,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2569920.0, ans=0.0 2024-08-14 08:30:57,722 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 08:31:00,445 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 08:31:03,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2570020.0, ans=0.125 2024-08-14 08:31:09,241 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 08:31:09,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2570120.0, ans=0.0 2024-08-14 08:31:20,851 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 29 from LS+wenet, 7 from Vox, 19 fro AS 2024-08-14 08:31:22,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2570120.0, ans=0.125 2024-08-14 08:31:24,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10650, loss[loss=0.1168, beats_loss=0.00991, ecapa_loss=0.0001599, whisper_loss=0.1053, over 18920.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001572, whisper_loss=0.09152, over 3910652.98 frames. ], batch size: 74, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:31:33,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2570220.0, ans=0.0 2024-08-14 08:31:36,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2570220.0, ans=0.125 2024-08-14 08:31:36,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-14 08:31:47,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2570320.0, ans=0.0 2024-08-14 08:31:48,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2570320.0, ans=0.125 2024-08-14 08:31:57,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-08-14 08:31:59,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2570420.0, ans=0.125 2024-08-14 08:32:00,343 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 08:32:02,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2570420.0, ans=0.0 2024-08-14 08:32:20,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2570520.0, ans=0.0 2024-08-14 08:32:21,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.360e+01 2.616e+01 3.033e+01 9.241e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-14 08:32:22,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2570520.0, ans=0.0 2024-08-14 08:32:25,224 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 08:32:39,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-14 08:32:46,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2570620.0, ans=0.0 2024-08-14 08:32:52,946 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10700, loss[loss=0.1124, beats_loss=0.01034, ecapa_loss=0.0001684, whisper_loss=0.1004, over 22608.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.000157, whisper_loss=0.09114, over 3917032.10 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:32:55,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2570720.0, ans=0.0 2024-08-14 08:33:11,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2570820.0, ans=0.125 2024-08-14 08:33:24,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2570820.0, ans=0.125 2024-08-14 08:33:54,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2571020.0, ans=0.1 2024-08-14 08:34:06,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2571120.0, ans=0.2 2024-08-14 08:34:21,010 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10750, loss[loss=0.09969, beats_loss=0.009898, ecapa_loss=0.0001856, whisper_loss=0.08794, over 16846.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001578, whisper_loss=0.09175, over 3914286.68 frames. ], batch size: 68, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:34:30,270 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 08:34:30,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2571220.0, ans=0.125 2024-08-14 08:34:31,839 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 08:34:35,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2571320.0, ans=0.1 2024-08-14 08:34:36,754 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 08:34:53,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2571420.0, ans=0.1 2024-08-14 08:35:02,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2571420.0, ans=0.05 2024-08-14 08:35:13,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.456e+01 2.714e+01 3.010e+01 3.209e+02, threshold=5.428e+01, percent-clipped=1.0 2024-08-14 08:35:14,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2024-08-14 08:35:38,307 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10800, loss[loss=0.1022, beats_loss=0.009109, ecapa_loss=0.0001925, whisper_loss=0.0912, over 21601.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001574, whisper_loss=0.09111, over 3906699.85 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:35:43,653 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 08:35:50,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2571720.0, ans=0.125 2024-08-14 08:35:50,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2571720.0, ans=0.125 2024-08-14 08:36:00,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2571820.0, ans=0.0 2024-08-14 08:36:07,134 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 08:36:20,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-14 08:36:44,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2572120.0, ans=0.2 2024-08-14 08:36:45,262 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 08:36:52,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10850, loss[loss=0.1068, beats_loss=0.01152, ecapa_loss=0.0001354, whisper_loss=0.0939, over 22664.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001576, whisper_loss=0.09084, over 3912821.44 frames. ], batch size: 93, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:36:53,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2572220.0, ans=0.0 2024-08-14 08:37:01,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2572220.0, ans=0.125 2024-08-14 08:37:02,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-14 08:37:21,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2572320.0, ans=0.125 2024-08-14 08:37:31,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-14 08:37:35,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2572420.0, ans=0.2 2024-08-14 08:37:45,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.429e+01 2.769e+01 3.241e+01 1.860e+02, threshold=5.537e+01, percent-clipped=2.0 2024-08-14 08:37:50,256 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 08:38:17,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10900, loss[loss=0.09623, beats_loss=0.01068, ecapa_loss=0.0001548, whisper_loss=0.084, over 22243.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001572, whisper_loss=0.09137, over 3925672.09 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:38:28,342 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 34 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 08:38:51,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2572820.0, ans=0.0 2024-08-14 08:38:52,763 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 08:39:07,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2572920.0, ans=0.0 2024-08-14 08:39:17,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2573020.0, ans=0.125 2024-08-14 08:39:17,548 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-14 08:39:28,329 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 08:39:28,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2573020.0, ans=0.125 2024-08-14 08:39:31,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2573120.0, ans=0.2 2024-08-14 08:39:37,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2573120.0, ans=0.0 2024-08-14 08:39:44,424 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 08:39:47,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 10950, loss[loss=0.1029, beats_loss=0.009865, ecapa_loss=0.0001655, whisper_loss=0.09136, over 22955.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001581, whisper_loss=0.0913, over 3954447.22 frames. ], batch size: 95, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:39:49,374 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.907e-01 2024-08-14 08:39:59,842 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 08:40:06,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2573320.0, ans=0.125 2024-08-14 08:40:28,682 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 08:40:35,967 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 08:40:37,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.432e+01 2.668e+01 2.934e+01 4.215e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 08:40:46,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2573520.0, ans=0.0 2024-08-14 08:40:47,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2573620.0, ans=0.0 2024-08-14 08:41:05,858 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11000, loss[loss=0.08702, beats_loss=0.01155, ecapa_loss=0.0001425, whisper_loss=0.07405, over 17046.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001597, whisper_loss=0.09087, over 3930148.81 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:41:11,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2573720.0, ans=0.125 2024-08-14 08:41:23,802 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 08:41:25,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.75 vs. limit=5.0 2024-08-14 08:41:30,378 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 08:42:13,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2574020.0, ans=0.1 2024-08-14 08:42:25,641 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 08:42:38,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11050, loss[loss=0.08586, beats_loss=0.01371, ecapa_loss=0.0001468, whisper_loss=0.07068, over 18315.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001577, whisper_loss=0.09084, over 3926255.58 frames. ], batch size: 73, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:42:59,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-14 08:43:06,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-08-14 08:43:10,900 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 08:43:30,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2574420.0, ans=10.0 2024-08-14 08:43:39,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2574520.0, ans=15.0 2024-08-14 08:43:41,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.313e+01 2.557e+01 2.807e+01 4.067e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 08:43:45,954 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 08:43:58,493 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 08:44:06,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2574620.0, ans=0.2 2024-08-14 08:44:12,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2574620.0, ans=0.0 2024-08-14 08:44:20,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11100, loss[loss=0.1083, beats_loss=0.0101, ecapa_loss=0.0001677, whisper_loss=0.09651, over 21231.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001592, whisper_loss=0.09062, over 3913704.17 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:44:52,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2574820.0, ans=0.125 2024-08-14 08:45:20,021 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 08:45:38,371 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 08:45:55,271 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-14 08:46:06,979 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 08:46:13,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11150, loss[loss=0.08801, beats_loss=0.01105, ecapa_loss=0.0001343, whisper_loss=0.07562, over 13626.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001585, whisper_loss=0.09062, over 3885396.83 frames. ], batch size: 54, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:46:14,067 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 08:46:15,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2575220.0, ans=0.125 2024-08-14 08:46:20,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.87 vs. limit=15.0 2024-08-14 08:46:22,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2575220.0, ans=0.125 2024-08-14 08:46:58,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2575420.0, ans=0.125 2024-08-14 08:47:28,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2575520.0, ans=0.0 2024-08-14 08:47:29,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.286e+01 2.573e+01 3.032e+01 5.380e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-14 08:48:11,291 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11200, loss[loss=0.1268, beats_loss=0.005074, ecapa_loss=0.0001871, whisper_loss=0.1199, over 14437.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001593, whisper_loss=0.09097, over 3869873.94 frames. ], batch size: 55, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:48:15,485 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 08:48:36,702 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 08:48:46,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.04 vs. limit=22.5 2024-08-14 08:48:54,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2575920.0, ans=0.125 2024-08-14 08:49:07,257 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 08:49:11,172 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 08:49:18,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2576020.0, ans=0.125 2024-08-14 08:49:23,483 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 08:49:36,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11250, loss[loss=0.1091, beats_loss=0.008087, ecapa_loss=0.0001776, whisper_loss=0.09926, over 16446.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001592, whisper_loss=0.0913, over 3837249.82 frames. ], batch size: 64, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:49:41,857 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 08:49:42,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2576220.0, ans=0.125 2024-08-14 08:49:57,800 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 08:50:26,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.439e+01 2.714e+01 2.976e+01 1.044e+02, threshold=5.429e+01, percent-clipped=2.0 2024-08-14 08:50:33,066 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 08:50:45,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-08-14 08:50:54,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11300, loss[loss=0.1058, beats_loss=0.007479, ecapa_loss=0.000185, whisper_loss=0.09647, over 14933.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01056, ecapa_loss=0.0001596, whisper_loss=0.09181, over 3900857.60 frames. ], batch size: 60, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:51:16,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2576820.0, ans=0.1 2024-08-14 08:51:28,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2576920.0, ans=0.125 2024-08-14 08:51:43,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2577020.0, ans=0.0 2024-08-14 08:51:46,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2577020.0, ans=0.1 2024-08-14 08:51:49,387 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-14 08:51:58,985 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 08:52:16,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11350, loss[loss=0.1161, beats_loss=0.00955, ecapa_loss=0.0001683, whisper_loss=0.1049, over 22547.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.00016, whisper_loss=0.0917, over 3901472.71 frames. ], batch size: 93, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:52:16,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2577220.0, ans=0.125 2024-08-14 08:52:28,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-08-14 08:52:30,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=12.0 2024-08-14 08:52:32,371 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 08:53:11,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2577520.0, ans=0.125 2024-08-14 08:53:14,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.319e+01 2.550e+01 2.857e+01 6.146e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 08:53:21,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2577520.0, ans=0.2 2024-08-14 08:53:31,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=22.5 2024-08-14 08:53:31,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-14 08:53:40,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11400, loss[loss=0.09602, beats_loss=0.0111, ecapa_loss=0.0001756, whisper_loss=0.08317, over 20301.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01052, ecapa_loss=0.0001591, whisper_loss=0.09222, over 3889779.55 frames. ], batch size: 84, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:53:56,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2577820.0, ans=0.1 2024-08-14 08:54:14,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2577920.0, ans=0.125 2024-08-14 08:54:15,681 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 08:54:16,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2577920.0, ans=0.1 2024-08-14 08:54:25,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2577920.0, ans=0.125 2024-08-14 08:54:28,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2578020.0, ans=0.2 2024-08-14 08:54:32,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2578020.0, ans=0.125 2024-08-14 08:54:38,232 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 08:54:50,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2578120.0, ans=0.125 2024-08-14 08:54:55,619 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 08:54:58,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11450, loss[loss=0.08791, beats_loss=0.01379, ecapa_loss=0.0001372, whisper_loss=0.07276, over 17046.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01053, ecapa_loss=0.0001582, whisper_loss=0.09211, over 3884593.24 frames. ], batch size: 70, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:55:02,886 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 08:55:05,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2578220.0, ans=0.0 2024-08-14 08:55:23,468 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 08:55:36,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-14 08:55:48,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.444e+01 2.679e+01 2.887e+01 5.368e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-14 08:55:49,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2578520.0, ans=0.0 2024-08-14 08:55:52,054 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 08:55:52,977 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:56:02,075 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 08:56:13,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11500, loss[loss=0.1062, beats_loss=0.007284, ecapa_loss=0.0001669, whisper_loss=0.09722, over 16789.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01051, ecapa_loss=0.0001573, whisper_loss=0.0921, over 3887446.29 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:56:30,036 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 08:56:35,337 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:56:38,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2578820.0, ans=0.0 2024-08-14 08:56:52,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2578920.0, ans=0.1 2024-08-14 08:56:56,089 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 08:57:02,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2579020.0, ans=0.0 2024-08-14 08:57:19,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-14 08:57:27,501 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 08:57:28,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11550, loss[loss=0.09275, beats_loss=0.0115, ecapa_loss=0.0001704, whisper_loss=0.07955, over 21864.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001573, whisper_loss=0.09182, over 3898082.93 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:57:29,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2579220.0, ans=0.125 2024-08-14 08:57:30,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2579220.0, ans=0.0 2024-08-14 08:57:51,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2579320.0, ans=0.1 2024-08-14 08:57:57,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2024-08-14 08:58:18,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.371e+01 2.639e+01 3.011e+01 4.840e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-14 08:58:32,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-08-14 08:58:33,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2579620.0, ans=0.125 2024-08-14 08:58:38,705 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 08:58:40,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-14 08:58:41,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11600, loss[loss=0.09108, beats_loss=0.01051, ecapa_loss=0.0001441, whisper_loss=0.07912, over 23262.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001563, whisper_loss=0.09097, over 3935097.72 frames. ], batch size: 95, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:58:47,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2579720.0, ans=0.0 2024-08-14 08:58:58,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-14 08:59:06,641 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 08:59:08,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2579820.0, ans=0.125 2024-08-14 08:59:19,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2579920.0, ans=0.0 2024-08-14 08:59:53,130 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11650, loss[loss=0.1124, beats_loss=0.01003, ecapa_loss=0.0001591, whisper_loss=0.1008, over 19518.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001567, whisper_loss=0.09163, over 3940448.80 frames. ], batch size: 79, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:59:57,813 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 09:00:06,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2024-08-14 09:00:15,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-14 09:00:26,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2580420.0, ans=0.125 2024-08-14 09:00:41,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2580520.0, ans=0.125 2024-08-14 09:00:41,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.343e+01 2.621e+01 2.855e+01 6.176e+01, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 09:00:52,096 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-14 09:00:56,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2580620.0, ans=0.125 2024-08-14 09:01:06,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11700, loss[loss=0.09394, beats_loss=0.01325, ecapa_loss=0.0001021, whisper_loss=0.07967, over 23692.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.09132, over 3939921.61 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:01:08,709 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 39 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 09:01:09,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2580720.0, ans=0.125 2024-08-14 09:01:13,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.49 vs. limit=6.0 2024-08-14 09:01:36,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2580920.0, ans=0.0 2024-08-14 09:02:01,721 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 09:02:06,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2581120.0, ans=0.0 2024-08-14 09:02:06,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-08-14 09:02:09,865 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 09:02:18,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11750, loss[loss=0.1148, beats_loss=0.009376, ecapa_loss=0.0001349, whisper_loss=0.104, over 17509.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001552, whisper_loss=0.0919, over 3980027.47 frames. ], batch size: 64, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:02:26,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2581220.0, ans=0.1 2024-08-14 09:02:26,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2581220.0, ans=0.5 2024-08-14 09:02:32,929 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 09:02:39,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-14 09:02:44,540 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 09:02:53,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2581420.0, ans=0.0 2024-08-14 09:03:07,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.345e+01 2.656e+01 2.861e+01 8.705e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-14 09:03:07,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2581520.0, ans=0.0 2024-08-14 09:03:10,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2581520.0, ans=0.0 2024-08-14 09:03:17,183 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 09:03:24,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2581620.0, ans=0.125 2024-08-14 09:03:28,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2581620.0, ans=0.05 2024-08-14 09:03:29,156 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 09:03:30,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11800, loss[loss=0.1206, beats_loss=0.009515, ecapa_loss=0.0001368, whisper_loss=0.1097, over 23633.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001556, whisper_loss=0.09225, over 3952319.28 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:03:37,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2581720.0, ans=0.125 2024-08-14 09:04:00,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2581820.0, ans=0.125 2024-08-14 09:04:11,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-14 09:04:20,555 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 09:04:39,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2582120.0, ans=0.0 2024-08-14 09:04:58,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11850, loss[loss=0.126, beats_loss=0.01012, ecapa_loss=0.0001438, whisper_loss=0.1144, over 23669.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001541, whisper_loss=0.09231, over 3928528.26 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:05:13,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2582220.0, ans=0.07 2024-08-14 09:05:23,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2024-08-14 09:05:28,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2582320.0, ans=0.125 2024-08-14 09:05:41,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2582420.0, ans=0.0 2024-08-14 09:05:46,298 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 09:05:55,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2582520.0, ans=0.0 2024-08-14 09:06:01,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.374e+01 2.631e+01 2.932e+01 6.705e+01, threshold=5.263e+01, percent-clipped=1.0 2024-08-14 09:06:01,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2582520.0, ans=0.2 2024-08-14 09:06:05,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2582520.0, ans=0.1 2024-08-14 09:06:22,861 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 09:06:30,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2582720.0, ans=0.1 2024-08-14 09:06:30,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11900, loss[loss=0.1209, beats_loss=0.008045, ecapa_loss=0.0001644, whisper_loss=0.1112, over 23550.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.0001548, whisper_loss=0.0924, over 3958360.63 frames. ], batch size: 94, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:06:34,047 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 09:06:40,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2582720.0, ans=0.125 2024-08-14 09:06:51,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2582820.0, ans=0.0 2024-08-14 09:07:06,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2582820.0, ans=0.125 2024-08-14 09:07:09,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2582920.0, ans=0.0 2024-08-14 09:07:32,039 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 09:07:41,833 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 09:07:45,643 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 09:07:58,344 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 09:08:01,790 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 11950, loss[loss=0.1217, beats_loss=0.01106, ecapa_loss=0.0001538, whisper_loss=0.1091, over 22563.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001567, whisper_loss=0.09158, over 3938215.39 frames. ], batch size: 91, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:08:08,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2583220.0, ans=0.125 2024-08-14 09:08:25,545 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 09:08:35,550 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 09:08:53,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2583520.0, ans=0.125 2024-08-14 09:08:55,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.350e+01 2.661e+01 2.995e+01 4.363e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-14 09:08:58,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2583520.0, ans=0.2 2024-08-14 09:09:22,791 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12000, loss[loss=0.08834, beats_loss=0.008599, ecapa_loss=0.0001724, whisper_loss=0.07802, over 16141.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001553, whisper_loss=0.09079, over 3929150.51 frames. ], batch size: 63, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:09:22,792 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 09:10:00,993 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005459, whisper_loss=0.2479, over 922467.00 frames. 2024-08-14 09:10:19,242 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on SV_voxceleb1: loss=0.004372, beats_loss=0, ecapa_loss=0.0004372, whisper_loss=0, over 939242.00 frames. 2024-08-14 09:12:09,305 INFO [train_multi_KD3.py:1149] (2/4) Epoch 18, validation on AT_audioset: loss=0.02349, beats_loss=0.02349, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 09:12:09,309 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 09:12:31,515 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 09:12:35,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-14 09:12:39,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2583920.0, ans=0.0 2024-08-14 09:12:41,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2583920.0, ans=0.125 2024-08-14 09:12:46,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2583920.0, ans=0.125 2024-08-14 09:12:53,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-08-14 09:12:54,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2583920.0, ans=0.1 2024-08-14 09:13:00,263 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 09:13:03,426 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 09:13:10,292 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 09:13:11,856 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 09:13:27,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12050, loss[loss=0.107, beats_loss=0.006871, ecapa_loss=0.0001955, whisper_loss=0.09822, over 16945.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001552, whisper_loss=0.09092, over 3913657.24 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:13:29,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2584220.0, ans=0.125 2024-08-14 09:13:30,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-14 09:13:32,275 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 09:13:36,013 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 09:13:38,270 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.738e+01 2024-08-14 09:13:53,363 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 09:13:55,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2584320.0, ans=0.125 2024-08-14 09:13:59,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-14 09:14:03,978 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 11 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 09:14:07,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2584420.0, ans=0.0 2024-08-14 09:14:13,237 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 09:14:20,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.316e+01 2.502e+01 2.940e+01 4.119e+01, threshold=5.004e+01, percent-clipped=0.0 2024-08-14 09:14:35,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-14 09:14:44,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12100, loss[loss=0.08386, beats_loss=0.009573, ecapa_loss=0.0001322, whisper_loss=0.07297, over 18035.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001564, whisper_loss=0.09139, over 3916554.31 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:14:50,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2584720.0, ans=0.125 2024-08-14 09:15:04,093 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-14 09:15:06,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2584820.0, ans=0.2 2024-08-14 09:15:22,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2024-08-14 09:15:25,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2584920.0, ans=0.1 2024-08-14 09:15:37,106 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 09:15:40,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2585020.0, ans=0.1 2024-08-14 09:15:57,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2585120.0, ans=0.1 2024-08-14 09:16:00,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12150, loss[loss=0.09375, beats_loss=0.01352, ecapa_loss=0.0001248, whisper_loss=0.07898, over 19645.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001566, whisper_loss=0.09101, over 3928402.20 frames. ], batch size: 78, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:16:01,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2585220.0, ans=0.0 2024-08-14 09:16:06,688 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 09:16:08,171 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 09:16:30,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2585420.0, ans=0.125 2024-08-14 09:16:32,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.43 vs. limit=15.0 2024-08-14 09:16:36,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2585420.0, ans=0.0 2024-08-14 09:16:37,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2585420.0, ans=0.125 2024-08-14 09:16:39,301 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-14 09:16:40,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2585420.0, ans=0.125 2024-08-14 09:16:50,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.391e+01 2.592e+01 3.075e+01 2.876e+02, threshold=5.185e+01, percent-clipped=6.0 2024-08-14 09:16:51,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2585520.0, ans=0.125 2024-08-14 09:17:10,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-14 09:17:12,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2585620.0, ans=0.125 2024-08-14 09:17:14,205 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 09:17:15,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12200, loss[loss=0.09955, beats_loss=0.01086, ecapa_loss=0.0001681, whisper_loss=0.08701, over 20954.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001571, whisper_loss=0.09063, over 3918205.43 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:17:15,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2585720.0, ans=0.125 2024-08-14 09:17:16,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2024-08-14 09:17:21,683 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 09:17:53,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2585920.0, ans=0.2 2024-08-14 09:18:02,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2586020.0, ans=0.125 2024-08-14 09:18:07,764 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 09:18:21,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2586120.0, ans=0.125 2024-08-14 09:18:23,350 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06917443126440048, model_norm_threshold=51.84561538696289 2024-08-14 09:18:23,560 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.256e+05, grad_sumsq=1.256e+05, orig_rms_sq=1.000e+00 2024-08-14 09:18:25,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2586120.0, ans=0.0 2024-08-14 09:18:28,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2586220.0, ans=0.0 2024-08-14 09:18:29,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12250, loss[loss=0.09896, beats_loss=0.01066, ecapa_loss=0.0001639, whisper_loss=0.08666, over 21664.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001575, whisper_loss=0.09074, over 3921140.63 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:18:35,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2586220.0, ans=0.04949747468305833 2024-08-14 09:18:37,248 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 09:18:38,809 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2024-08-14 09:18:41,312 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-14 09:18:54,683 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 09:19:03,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2024-08-14 09:19:13,680 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 09:19:23,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.394e+01 2.719e+01 3.099e+01 7.495e+02, threshold=5.439e+01, percent-clipped=1.0 2024-08-14 09:19:36,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2586620.0, ans=0.2 2024-08-14 09:19:44,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2586620.0, ans=0.0 2024-08-14 09:19:47,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12300, loss[loss=0.1061, beats_loss=0.01147, ecapa_loss=0.0001249, whisper_loss=0.09343, over 22777.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001585, whisper_loss=0.09053, over 3924494.18 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:19:52,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2586720.0, ans=0.0 2024-08-14 09:19:52,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2586720.0, ans=0.0 2024-08-14 09:20:10,126 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 09:20:16,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2586820.0, ans=0.125 2024-08-14 09:20:26,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2586920.0, ans=0.125 2024-08-14 09:20:27,824 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 09:21:06,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2024-08-14 09:21:10,063 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 09:21:15,964 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 09:21:19,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2587120.0, ans=0.2 2024-08-14 09:21:21,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2587220.0, ans=0.1 2024-08-14 09:21:22,163 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12350, loss[loss=0.09355, beats_loss=0.009672, ecapa_loss=0.0001785, whisper_loss=0.08209, over 21435.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001592, whisper_loss=0.09049, over 3914695.99 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:21:24,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2587220.0, ans=0.125 2024-08-14 09:21:44,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2587320.0, ans=0.125 2024-08-14 09:21:45,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-14 09:21:53,530 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.501e+01 2024-08-14 09:21:53,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-14 09:22:22,773 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-14 09:22:24,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.397e+01 2.618e+01 2.960e+01 3.782e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-14 09:22:27,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2587520.0, ans=0.2 2024-08-14 09:22:29,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2587520.0, ans=0.2 2024-08-14 09:22:29,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2587520.0, ans=15.0 2024-08-14 09:22:33,233 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 09:22:41,005 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 43 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 09:22:47,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12400, loss[loss=0.1112, beats_loss=0.008402, ecapa_loss=0.0001823, whisper_loss=0.1009, over 22281.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001588, whisper_loss=0.09089, over 3925631.13 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:22:48,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2587720.0, ans=0.0 2024-08-14 09:22:58,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2587720.0, ans=0.0 2024-08-14 09:23:01,153 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 09:23:10,565 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 09:23:18,125 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-14 09:24:02,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12450, loss[loss=0.07618, beats_loss=0.01404, ecapa_loss=0.0001846, whisper_loss=0.06029, over 20896.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001597, whisper_loss=0.09072, over 3881131.98 frames. ], batch size: 93, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:24:03,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2588220.0, ans=10.0 2024-08-14 09:24:07,400 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 09:24:10,282 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 31 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 09:24:24,411 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 09:24:27,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2588320.0, ans=0.0 2024-08-14 09:24:43,484 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 09:24:55,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.420e+01 2.657e+01 3.140e+01 9.625e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 09:24:55,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2588520.0, ans=0.09899494936611666 2024-08-14 09:24:55,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2588520.0, ans=0.125 2024-08-14 09:25:01,354 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 13 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 09:25:18,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12500, loss[loss=0.1033, beats_loss=0.01244, ecapa_loss=0.0001139, whisper_loss=0.08972, over 16510.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001583, whisper_loss=0.09122, over 3884180.42 frames. ], batch size: 62, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:25:19,191 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 09:25:24,090 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 09:25:42,754 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:25:43,801 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 09:25:53,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2588920.0, ans=0.125 2024-08-14 09:26:12,735 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 09:26:21,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.46 vs. limit=22.5 2024-08-14 09:26:34,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2589220.0, ans=0.1 2024-08-14 09:26:35,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12550, loss[loss=0.09073, beats_loss=0.01295, ecapa_loss=0.0001303, whisper_loss=0.07648, over 22347.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001577, whisper_loss=0.0909, over 3898343.47 frames. ], batch size: 91, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:26:36,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2589220.0, ans=0.0 2024-08-14 09:26:37,260 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 09:26:59,643 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 09:27:22,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2589520.0, ans=0.1 2024-08-14 09:27:29,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.466e+01 2.734e+01 3.063e+01 5.302e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-14 09:27:41,558 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 09:27:52,816 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-14 09:27:54,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12600, loss[loss=0.1014, beats_loss=0.009109, ecapa_loss=0.0001651, whisper_loss=0.09064, over 22196.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001583, whisper_loss=0.09105, over 3915680.71 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:28:16,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2589820.0, ans=0.1 2024-08-14 09:28:30,625 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-14 09:28:43,261 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-14 09:28:49,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-14 09:28:52,863 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 09:29:11,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2590120.0, ans=0.125 2024-08-14 09:29:21,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2590120.0, ans=0.125 2024-08-14 09:29:23,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2024-08-14 09:29:29,738 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12650, loss[loss=0.08616, beats_loss=0.00862, ecapa_loss=0.000177, whisper_loss=0.07577, over 16348.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001592, whisper_loss=0.0904, over 3908501.14 frames. ], batch size: 65, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:29:36,447 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 09:29:40,389 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 09:29:47,117 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 09:29:54,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2590320.0, ans=0.2 2024-08-14 09:30:02,585 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 09:30:02,979 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.425e-02 2024-08-14 09:30:27,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2590520.0, ans=0.0 2024-08-14 09:30:36,070 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 09:30:37,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.252e+01 2.572e+01 2.889e+01 4.246e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-14 09:31:01,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12700, loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001374, whisper_loss=0.09125, over 23642.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001588, whisper_loss=0.09081, over 3884510.99 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:31:19,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2590820.0, ans=0.0 2024-08-14 09:31:21,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-14 09:31:27,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2024-08-14 09:31:30,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-14 09:31:45,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2591020.0, ans=0.125 2024-08-14 09:31:55,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2591020.0, ans=0.125 2024-08-14 09:32:15,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12750, loss[loss=0.1023, beats_loss=0.01139, ecapa_loss=0.000176, whisper_loss=0.08917, over 18618.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01083, ecapa_loss=0.000159, whisper_loss=0.09059, over 3889877.31 frames. ], batch size: 75, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:32:19,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-14 09:32:29,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2591320.0, ans=0.125 2024-08-14 09:32:32,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2591320.0, ans=0.125 2024-08-14 09:33:05,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2591520.0, ans=0.0 2024-08-14 09:33:07,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.400e+01 2.605e+01 3.000e+01 2.756e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-14 09:33:20,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2591620.0, ans=0.05 2024-08-14 09:33:30,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12800, loss[loss=0.1115, beats_loss=0.01153, ecapa_loss=0.0001248, whisper_loss=0.09868, over 16419.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001576, whisper_loss=0.09141, over 3926459.61 frames. ], batch size: 62, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:33:37,709 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-14 09:33:55,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2591820.0, ans=0.0 2024-08-14 09:34:02,849 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 11 from Vox, 45 fro AS 2024-08-14 09:34:07,107 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 09:34:16,596 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 09:34:24,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2592020.0, ans=0.125 2024-08-14 09:34:28,309 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 09:34:35,530 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 09:35:17,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12850, loss[loss=0.09703, beats_loss=0.01186, ecapa_loss=0.0001531, whisper_loss=0.08363, over 14652.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.000158, whisper_loss=0.09019, over 3863218.41 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:35:48,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2592320.0, ans=0.125 2024-08-14 09:36:05,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-14 09:36:09,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2592420.0, ans=0.125 2024-08-14 09:36:09,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2592420.0, ans=0.025 2024-08-14 09:36:20,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2592520.0, ans=0.125 2024-08-14 09:36:23,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.339e+01 2.541e+01 2.791e+01 1.384e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-14 09:36:29,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-14 09:36:30,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2024-08-14 09:36:46,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12900, loss[loss=0.07983, beats_loss=0.00867, ecapa_loss=0.0002004, whisper_loss=0.06916, over 14891.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001593, whisper_loss=0.09033, over 3869110.25 frames. ], batch size: 59, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:37:04,369 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 09:37:25,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2592820.0, ans=0.125 2024-08-14 09:37:54,650 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 18 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-14 09:38:09,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-14 09:38:37,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 12950, loss[loss=0.1024, beats_loss=0.01139, ecapa_loss=0.0001272, whisper_loss=0.08975, over 22193.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001589, whisper_loss=0.09054, over 3864347.08 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:38:47,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2593220.0, ans=0.125 2024-08-14 09:38:49,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2593220.0, ans=0.2 2024-08-14 09:39:09,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2593320.0, ans=0.125 2024-08-14 09:39:42,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2593520.0, ans=0.1 2024-08-14 09:39:50,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.412e+01 2.710e+01 3.075e+01 4.932e+01, threshold=5.420e+01, percent-clipped=0.0 2024-08-14 09:39:54,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2024-08-14 09:40:01,008 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-14 09:40:19,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2024-08-14 09:40:27,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13000, loss[loss=0.08366, beats_loss=0.01143, ecapa_loss=0.0001765, whisper_loss=0.07047, over 16668.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001575, whisper_loss=0.09034, over 3864572.28 frames. ], batch size: 68, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:40:32,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2593720.0, ans=0.125 2024-08-14 09:40:34,381 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 09:40:59,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2593820.0, ans=0.125 2024-08-14 09:41:17,910 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 09:41:27,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2593920.0, ans=0.1 2024-08-14 09:41:44,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2594020.0, ans=10.0 2024-08-14 09:41:52,417 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 09:41:55,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2594020.0, ans=0.125 2024-08-14 09:41:59,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2594120.0, ans=0.125 2024-08-14 09:42:12,960 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-14 09:42:22,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13050, loss[loss=0.1131, beats_loss=0.01017, ecapa_loss=0.0001571, whisper_loss=0.1014, over 21905.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.000159, whisper_loss=0.09086, over 3857695.83 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:43:10,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2594420.0, ans=0.125 2024-08-14 09:43:18,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2594420.0, ans=0.125 2024-08-14 09:43:32,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.338e+01 2.607e+01 2.942e+01 4.688e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-14 09:43:37,466 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 09:44:00,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-14 09:44:03,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13100, loss[loss=0.1056, beats_loss=0.008768, ecapa_loss=0.0001536, whisper_loss=0.0953, over 18346.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001572, whisper_loss=0.09062, over 3865717.80 frames. ], batch size: 70, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:44:04,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2024-08-14 09:44:12,611 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 09:44:18,240 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 09:44:24,683 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 09:44:26,724 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 09:44:32,754 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 09:44:37,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2594820.0, ans=0.09899494936611666 2024-08-14 09:45:22,618 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:45:29,478 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13150, loss[loss=0.0976, beats_loss=0.01231, ecapa_loss=0.0001875, whisper_loss=0.08341, over 18614.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001562, whisper_loss=0.09093, over 3871065.10 frames. ], batch size: 79, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:45:34,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2595220.0, ans=0.125 2024-08-14 09:45:36,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2595220.0, ans=0.0 2024-08-14 09:45:49,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2595320.0, ans=0.125 2024-08-14 09:45:53,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-14 09:46:01,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2595420.0, ans=0.125 2024-08-14 09:46:14,949 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 09:46:20,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.308e+01 2.613e+01 2.975e+01 4.681e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-14 09:46:23,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2595520.0, ans=0.0 2024-08-14 09:46:26,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2595620.0, ans=0.125 2024-08-14 09:46:42,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13200, loss[loss=0.107, beats_loss=0.009509, ecapa_loss=0.0001427, whisper_loss=0.09608, over 22758.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001564, whisper_loss=0.09071, over 3880053.09 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:46:46,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-08-14 09:47:01,072 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 09:47:08,192 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 43 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 09:47:39,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2596020.0, ans=0.125 2024-08-14 09:47:45,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2596020.0, ans=0.125 2024-08-14 09:47:51,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2024-08-14 09:47:58,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2596120.0, ans=0.07 2024-08-14 09:48:16,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13250, loss[loss=0.1069, beats_loss=0.01051, ecapa_loss=0.0001584, whisper_loss=0.09485, over 16017.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001576, whisper_loss=0.09112, over 3915822.85 frames. ], batch size: 64, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:48:16,606 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 09:48:26,925 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 09:48:36,886 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 09:48:45,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2596320.0, ans=0.125 2024-08-14 09:48:57,569 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 32 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 09:49:05,494 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 09:49:26,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.423e+01 2.644e+01 3.025e+01 2.002e+02, threshold=5.289e+01, percent-clipped=3.0 2024-08-14 09:49:27,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2596520.0, ans=0.04949747468305833 2024-08-14 09:49:57,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13300, loss[loss=0.1068, beats_loss=0.008873, ecapa_loss=0.0002035, whisper_loss=0.09591, over 19813.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001578, whisper_loss=0.09124, over 3888978.12 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:49:57,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2596720.0, ans=0.125 2024-08-14 09:50:13,885 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.640e-03 2024-08-14 09:50:17,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2596820.0, ans=0.0 2024-08-14 09:50:21,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2596820.0, ans=0.125 2024-08-14 09:50:44,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2596920.0, ans=0.0 2024-08-14 09:50:47,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-14 09:50:55,879 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 09:50:59,275 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-14 09:51:12,051 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 09:51:24,868 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 09:51:29,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-08-14 09:51:31,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13350, loss[loss=0.1107, beats_loss=0.01019, ecapa_loss=0.0001786, whisper_loss=0.09869, over 22039.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001581, whisper_loss=0.091, over 3897806.80 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:51:34,234 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 09:52:09,929 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 09:52:24,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=12.0 2024-08-14 09:52:25,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.409e+01 2.670e+01 3.063e+01 5.921e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-14 09:52:29,895 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 09:52:47,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13400, loss[loss=0.1074, beats_loss=0.01032, ecapa_loss=0.0001745, whisper_loss=0.09532, over 16945.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001582, whisper_loss=0.09049, over 3887296.46 frames. ], batch size: 67, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:52:52,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2597720.0, ans=0.125 2024-08-14 09:52:52,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-14 09:52:54,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2597720.0, ans=0.125 2024-08-14 09:52:55,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-14 09:53:01,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2597720.0, ans=0.125 2024-08-14 09:53:12,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2597820.0, ans=0.0 2024-08-14 09:53:24,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2597920.0, ans=0.2 2024-08-14 09:53:27,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2597920.0, ans=0.1 2024-08-14 09:53:42,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2598020.0, ans=0.0 2024-08-14 09:53:46,523 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 09:53:53,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2598120.0, ans=0.025 2024-08-14 09:54:06,821 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13450, loss[loss=0.1035, beats_loss=0.01229, ecapa_loss=0.0001386, whisper_loss=0.08985, over 23106.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001577, whisper_loss=0.09055, over 3907942.71 frames. ], batch size: 94, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:54:30,116 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 09:54:30,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2598320.0, ans=0.125 2024-08-14 09:54:43,867 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 09:54:55,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2598520.0, ans=0.95 2024-08-14 09:54:55,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2598520.0, ans=0.1 2024-08-14 09:54:57,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2598520.0, ans=0.0 2024-08-14 09:54:59,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.340e+01 2.558e+01 2.955e+01 5.061e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-14 09:55:03,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2598520.0, ans=0.125 2024-08-14 09:55:05,782 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 09:55:20,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13500, loss[loss=0.1136, beats_loss=0.01087, ecapa_loss=0.0001534, whisper_loss=0.1012, over 20352.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001567, whisper_loss=0.0907, over 3887708.32 frames. ], batch size: 80, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:55:28,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2598720.0, ans=0.0 2024-08-14 09:55:45,187 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 09:55:49,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2598920.0, ans=0.125 2024-08-14 09:55:52,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2598920.0, ans=0.0 2024-08-14 09:56:08,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2599020.0, ans=0.125 2024-08-14 09:56:11,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2599020.0, ans=0.125 2024-08-14 09:56:17,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2599020.0, ans=0.2 2024-08-14 09:56:27,855 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 09:56:33,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13550, loss[loss=0.1231, beats_loss=0.008916, ecapa_loss=0.0001273, whisper_loss=0.1129, over 18198.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001562, whisper_loss=0.09083, over 3875374.26 frames. ], batch size: 66, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:56:52,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2599320.0, ans=0.1 2024-08-14 09:57:20,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-14 09:57:24,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.303e+01 2.544e+01 2.977e+01 7.464e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-14 09:57:25,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2599520.0, ans=0.125 2024-08-14 09:57:45,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13600, loss[loss=0.09024, beats_loss=0.0109, ecapa_loss=0.0001884, whisper_loss=0.07746, over 21077.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001568, whisper_loss=0.09166, over 3886208.56 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:57:49,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2599720.0, ans=0.0 2024-08-14 09:58:15,236 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 09:58:24,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2599920.0, ans=0.125 2024-08-14 09:58:33,798 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 09:59:01,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13650, loss[loss=0.1115, beats_loss=0.01167, ecapa_loss=0.0001533, whisper_loss=0.09826, over 18128.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.000158, whisper_loss=0.09173, over 3881644.22 frames. ], batch size: 71, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:59:08,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2600220.0, ans=0.04949747468305833 2024-08-14 09:59:10,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2600220.0, ans=0.2 2024-08-14 09:59:20,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2600320.0, ans=0.0 2024-08-14 09:59:25,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-14 09:59:27,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2600320.0, ans=0.125 2024-08-14 09:59:27,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2600320.0, ans=0.125 2024-08-14 09:59:47,090 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 09:59:51,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.361e+01 2.642e+01 3.028e+01 5.099e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-14 09:59:51,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2600520.0, ans=0.125 2024-08-14 09:59:54,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2600520.0, ans=0.1 2024-08-14 09:59:57,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2600620.0, ans=0.125 2024-08-14 09:59:57,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2600620.0, ans=0.125 2024-08-14 10:00:13,504 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13700, loss[loss=0.1012, beats_loss=0.01072, ecapa_loss=0.0001319, whisper_loss=0.08918, over 14506.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.000157, whisper_loss=0.09178, over 3869086.93 frames. ], batch size: 55, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:00:20,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-14 10:00:21,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=12.0 2024-08-14 10:00:23,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2600720.0, ans=0.125 2024-08-14 10:00:24,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2600720.0, ans=0.0 2024-08-14 10:00:25,589 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 10:00:28,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2600820.0, ans=0.2 2024-08-14 10:00:32,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2600820.0, ans=0.1 2024-08-14 10:00:34,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2600820.0, ans=0.125 2024-08-14 10:00:45,449 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 10:01:01,393 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 10:01:03,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2601020.0, ans=0.1 2024-08-14 10:01:17,380 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-14 10:01:23,606 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 32 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 10:01:26,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13750, loss[loss=0.09314, beats_loss=0.0126, ecapa_loss=0.0001494, whisper_loss=0.07904, over 19689.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01061, ecapa_loss=0.0001581, whisper_loss=0.09177, over 3867876.29 frames. ], batch size: 83, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:01:26,697 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 10:01:27,958 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 10:01:28,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-14 10:01:34,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2601220.0, ans=0.125 2024-08-14 10:01:36,555 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 10:01:36,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2601220.0, ans=0.0 2024-08-14 10:01:39,445 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 10:01:40,932 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 10:01:51,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2601320.0, ans=0.125 2024-08-14 10:02:09,155 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 10:02:16,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2601520.0, ans=0.0 2024-08-14 10:02:17,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.453e+01 2.720e+01 3.144e+01 4.957e+01, threshold=5.441e+01, percent-clipped=0.0 2024-08-14 10:02:20,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2601520.0, ans=0.0 2024-08-14 10:02:20,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-14 10:02:27,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2601620.0, ans=0.0 2024-08-14 10:02:31,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-14 10:02:35,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2024-08-14 10:02:39,713 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13800, loss[loss=0.1113, beats_loss=0.01114, ecapa_loss=0.0001159, whisper_loss=0.09898, over 18250.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001574, whisper_loss=0.09203, over 3873102.52 frames. ], batch size: 70, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:02:40,258 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.025e-02 2024-08-14 10:02:41,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2601720.0, ans=0.125 2024-08-14 10:02:46,852 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 16 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 10:02:48,018 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 10:03:21,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2601920.0, ans=0.125 2024-08-14 10:03:22,516 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.533e-03 2024-08-14 10:03:27,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2602020.0, ans=0.0 2024-08-14 10:03:51,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13850, loss[loss=0.1214, beats_loss=0.009728, ecapa_loss=0.0001398, whisper_loss=0.1102, over 22137.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001565, whisper_loss=0.09177, over 3881238.90 frames. ], batch size: 83, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:03:52,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2602220.0, ans=0.0 2024-08-14 10:03:58,825 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 10:04:04,597 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 10:04:05,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2602320.0, ans=0.1 2024-08-14 10:04:06,026 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 10:04:16,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2602320.0, ans=0.0 2024-08-14 10:04:20,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2602420.0, ans=0.0 2024-08-14 10:04:28,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2602420.0, ans=0.125 2024-08-14 10:04:40,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.427e+01 2.695e+01 2.897e+01 4.823e+02, threshold=5.391e+01, percent-clipped=1.0 2024-08-14 10:04:50,607 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 10:04:50,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2602620.0, ans=0.125 2024-08-14 10:04:52,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2602620.0, ans=0.2 2024-08-14 10:05:00,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2602620.0, ans=0.0 2024-08-14 10:05:02,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13900, loss[loss=0.102, beats_loss=0.0121, ecapa_loss=0.0001236, whisper_loss=0.08869, over 18175.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001572, whisper_loss=0.09183, over 3886108.99 frames. ], batch size: 69, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:05:06,965 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2024-08-14 10:05:10,277 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 10:05:15,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2602720.0, ans=0.0 2024-08-14 10:05:35,173 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 10:05:39,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2602920.0, ans=0.125 2024-08-14 10:05:45,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2603020.0, ans=0.2 2024-08-14 10:06:15,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 13950, loss[loss=0.1137, beats_loss=0.009647, ecapa_loss=0.000153, whisper_loss=0.1025, over 18355.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001571, whisper_loss=0.09166, over 3886972.45 frames. ], batch size: 71, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:06:36,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2603320.0, ans=0.125 2024-08-14 10:07:03,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2603520.0, ans=0.0 2024-08-14 10:07:04,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.381e+01 2.587e+01 2.937e+01 5.454e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-14 10:07:05,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2603520.0, ans=0.125 2024-08-14 10:07:11,950 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 10:07:18,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2603620.0, ans=0.125 2024-08-14 10:07:23,365 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 10:07:24,936 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 10:07:26,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14000, loss[loss=0.1018, beats_loss=0.0108, ecapa_loss=0.0001244, whisper_loss=0.08979, over 17457.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0106, ecapa_loss=0.000156, whisper_loss=0.0921, over 3891276.31 frames. ], batch size: 65, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:07:32,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2603720.0, ans=0.125 2024-08-14 10:07:39,813 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 10:07:44,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2603820.0, ans=0.0 2024-08-14 10:08:02,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 2024-08-14 10:08:23,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2024-08-14 10:08:29,175 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 10:08:32,082 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 10:08:39,326 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14050, loss[loss=0.07456, beats_loss=0.01345, ecapa_loss=0.0001517, whisper_loss=0.0596, over 22540.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0106, ecapa_loss=0.0001554, whisper_loss=0.0922, over 3874119.66 frames. ], batch size: 94, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:08:41,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2604220.0, ans=0.125 2024-08-14 10:08:57,899 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 10:09:01,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2604320.0, ans=0.0 2024-08-14 10:09:01,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-14 10:09:18,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2604420.0, ans=0.125 2024-08-14 10:09:28,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2604520.0, ans=0.0 2024-08-14 10:09:29,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.346e+01 2.567e+01 2.904e+01 5.000e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-14 10:09:34,609 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 10:09:44,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:09:50,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14100, loss[loss=0.09558, beats_loss=0.008908, ecapa_loss=0.0001979, whisper_loss=0.08469, over 18070.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.0001556, whisper_loss=0.09211, over 3887865.22 frames. ], batch size: 76, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:09:53,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.0 2024-08-14 10:09:59,892 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-14 10:10:01,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2604720.0, ans=0.0 2024-08-14 10:10:11,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2604820.0, ans=0.0 2024-08-14 10:10:47,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-14 10:10:48,735 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 10:11:03,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14150, loss[loss=0.0764, beats_loss=0.009196, ecapa_loss=0.0001859, whisper_loss=0.06534, over 13485.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01062, ecapa_loss=0.0001554, whisper_loss=0.09209, over 3914853.97 frames. ], batch size: 54, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:11:10,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2605220.0, ans=0.0 2024-08-14 10:11:35,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-08-14 10:11:41,897 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 10:11:42,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2605420.0, ans=0.125 2024-08-14 10:11:46,446 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 10:11:53,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.390e+01 2.553e+01 2.829e+01 7.364e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-14 10:11:56,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2605520.0, ans=0.125 2024-08-14 10:12:00,833 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 10:12:08,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2605620.0, ans=0.0 2024-08-14 10:12:15,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14200, loss[loss=0.114, beats_loss=0.008756, ecapa_loss=0.0001617, whisper_loss=0.1036, over 23161.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001552, whisper_loss=0.09185, over 3936628.37 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:12:17,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2605720.0, ans=0.0 2024-08-14 10:12:19,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2605720.0, ans=0.125 2024-08-14 10:12:47,103 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 10:12:56,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2605920.0, ans=0.125 2024-08-14 10:12:57,569 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 10:12:57,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2606020.0, ans=0.0 2024-08-14 10:13:00,373 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 10:13:05,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2606020.0, ans=0.2 2024-08-14 10:13:06,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-14 10:13:07,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2606020.0, ans=0.0 2024-08-14 10:13:24,894 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 10:13:27,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14250, loss[loss=0.1194, beats_loss=0.01035, ecapa_loss=0.0001375, whisper_loss=0.1076, over 22109.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001555, whisper_loss=0.09118, over 3917838.20 frames. ], batch size: 85, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:13:29,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2606220.0, ans=0.5 2024-08-14 10:13:43,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2606320.0, ans=0.125 2024-08-14 10:13:49,226 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 10:13:55,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=2606420.0, ans=6.0 2024-08-14 10:14:18,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.438e+01 2.661e+01 3.044e+01 6.273e+01, threshold=5.322e+01, percent-clipped=2.0 2024-08-14 10:14:20,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2606520.0, ans=0.125 2024-08-14 10:14:21,630 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 10:14:23,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2606520.0, ans=0.0 2024-08-14 10:14:24,472 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 10:14:39,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14300, loss[loss=0.1064, beats_loss=0.01006, ecapa_loss=0.0001722, whisper_loss=0.0946, over 20268.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001548, whisper_loss=0.09099, over 3945491.72 frames. ], batch size: 84, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:14:47,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2606720.0, ans=0.1 2024-08-14 10:14:49,906 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 18 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 10:14:54,124 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 10:14:58,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2606820.0, ans=0.125 2024-08-14 10:15:04,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2606820.0, ans=0.125 2024-08-14 10:15:08,829 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 10:15:30,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2607020.0, ans=0.0 2024-08-14 10:15:55,687 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14350, loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.000125, whisper_loss=0.0918, over 22635.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001553, whisper_loss=0.09058, over 3931412.79 frames. ], batch size: 86, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:16:10,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2607220.0, ans=0.125 2024-08-14 10:16:15,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-14 10:16:17,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2607320.0, ans=0.125 2024-08-14 10:16:21,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-14 10:16:29,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2607420.0, ans=0.125 2024-08-14 10:16:29,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2607420.0, ans=0.125 2024-08-14 10:16:33,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-14 10:16:43,865 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 10:16:46,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2607520.0, ans=0.0 2024-08-14 10:16:55,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2607520.0, ans=0.125 2024-08-14 10:16:56,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.361e+01 2.607e+01 2.997e+01 4.259e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 10:17:09,690 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:17:23,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14400, loss[loss=0.1011, beats_loss=0.01192, ecapa_loss=0.0001419, whisper_loss=0.0878, over 21684.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.000155, whisper_loss=0.09108, over 3931161.85 frames. ], batch size: 88, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:17:34,464 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 10:17:42,634 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 10:17:46,200 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 10:17:50,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2607820.0, ans=0.1 2024-08-14 10:17:53,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2607820.0, ans=0.0 2024-08-14 10:17:53,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-14 10:17:55,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-14 10:17:58,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-14 10:18:02,361 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.145e-01 2024-08-14 10:18:23,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2608020.0, ans=0.0 2024-08-14 10:18:45,788 INFO [train_multi_KD3.py:1116] (2/4) Epoch 18, batch 14450, loss[loss=0.1018, beats_loss=0.01077, ecapa_loss=0.0001602, whisper_loss=0.08938, over 21019.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001575, whisper_loss=0.09078, over 3936421.21 frames. ], batch size: 82, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:19:05,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2608320.0, ans=0.0 2024-08-14 10:19:06,306 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 10:19:16,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-14 10:19:18,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2608420.0, ans=0.0 2024-08-14 10:19:55,442 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 0, loss[loss=0.07414, beats_loss=0.007999, ecapa_loss=0.0002153, whisper_loss=0.06399, over 14315.00 frames. ], tot_loss[loss=0.07414, beats_loss=0.007999, ecapa_loss=0.0002153, whisper_loss=0.06399, over 14315.00 frames. ], batch size: 58, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:19:55,442 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 10:20:37,885 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005486, whisper_loss=0.2484, over 922467.00 frames. 2024-08-14 10:20:53,977 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on SV_voxceleb1: loss=0.004382, beats_loss=0, ecapa_loss=0.0004382, whisper_loss=0, over 939242.00 frames. 2024-08-14 10:21:06,196 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.7740, 2.3282, 2.2421, 2.3164], device='cuda:2') 2024-08-14 10:22:56,816 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 10:22:56,819 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 10:23:05,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2608520.0, ans=0.125 2024-08-14 10:23:09,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.576e+01 3.022e+01 6.974e+01, threshold=5.152e+01, percent-clipped=1.0 2024-08-14 10:23:58,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2608720.0, ans=0.125 2024-08-14 10:24:18,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2608820.0, ans=0.2 2024-08-14 10:24:28,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2608820.0, ans=0.125 2024-08-14 10:24:46,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2608920.0, ans=0.0 2024-08-14 10:24:49,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2608920.0, ans=0.04949747468305833 2024-08-14 10:24:54,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2608920.0, ans=0.125 2024-08-14 10:25:06,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 50, loss[loss=0.1002, beats_loss=0.00976, ecapa_loss=0.0001528, whisper_loss=0.08891, over 22390.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01, ecapa_loss=0.000161, whisper_loss=0.08839, over 864452.74 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:25:11,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2609020.0, ans=0.125 2024-08-14 10:25:11,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2609020.0, ans=0.0 2024-08-14 10:25:24,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.73 vs. limit=15.0 2024-08-14 10:25:38,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2609120.0, ans=0.125 2024-08-14 10:25:42,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2609120.0, ans=0.0 2024-08-14 10:25:47,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2609120.0, ans=0.0 2024-08-14 10:26:34,462 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 10:26:56,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.17 vs. limit=10.0 2024-08-14 10:27:04,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 100, loss[loss=0.1115, beats_loss=0.01029, ecapa_loss=0.0001474, whisper_loss=0.09974, over 22688.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009836, ecapa_loss=0.0001571, whisper_loss=0.09023, over 1529000.45 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:27:15,294 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 10:27:16,384 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.614e+01 2.833e+01 3.144e+01 8.943e+01, threshold=5.666e+01, percent-clipped=3.0 2024-08-14 10:27:23,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2609520.0, ans=0.025 2024-08-14 10:27:38,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2609620.0, ans=0.1 2024-08-14 10:28:22,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2609820.0, ans=0.125 2024-08-14 10:28:33,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-08-14 10:28:35,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2609820.0, ans=0.125 2024-08-14 10:28:50,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2609920.0, ans=0.125 2024-08-14 10:28:52,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2609920.0, ans=0.125 2024-08-14 10:28:56,365 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 150, loss[loss=0.1033, beats_loss=0.0116, ecapa_loss=0.0001606, whisper_loss=0.09013, over 23344.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009792, ecapa_loss=0.000157, whisper_loss=0.0905, over 2067514.68 frames. ], batch size: 95, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:29:12,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2610020.0, ans=0.0 2024-08-14 10:29:22,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2610120.0, ans=0.0 2024-08-14 10:29:29,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2610220.0, ans=0.125 2024-08-14 10:29:34,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2610220.0, ans=0.0 2024-08-14 10:29:52,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2610320.0, ans=0.125 2024-08-14 10:29:58,646 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 10:30:08,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2024-08-14 10:30:18,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 200, loss[loss=0.07609, beats_loss=0.008668, ecapa_loss=0.0002209, whisper_loss=0.06522, over 17196.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009985, ecapa_loss=0.0001575, whisper_loss=0.08932, over 2419496.14 frames. ], batch size: 72, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:30:18,437 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 10:30:21,269 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 10:30:25,922 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.407e+01 2.774e+01 3.039e+01 4.574e+01, threshold=5.548e+01, percent-clipped=0.0 2024-08-14 10:30:30,121 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 10:30:34,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2610620.0, ans=0.1 2024-08-14 10:30:42,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2610620.0, ans=0.1 2024-08-14 10:31:04,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2610820.0, ans=0.1 2024-08-14 10:31:10,250 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-14 10:31:10,900 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 10:31:22,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2610920.0, ans=0.0 2024-08-14 10:31:27,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2610920.0, ans=0.125 2024-08-14 10:31:37,726 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 250, loss[loss=0.09087, beats_loss=0.01395, ecapa_loss=0.0001849, whisper_loss=0.07507, over 19667.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01019, ecapa_loss=0.0001575, whisper_loss=0.08983, over 2734640.65 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:31:47,479 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 10:32:27,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2611320.0, ans=0.125 2024-08-14 10:33:01,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 300, loss[loss=0.09505, beats_loss=0.009847, ecapa_loss=0.0001291, whisper_loss=0.08391, over 23902.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001573, whisper_loss=0.09005, over 2972194.40 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:33:10,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.366e+01 2.598e+01 2.945e+01 2.183e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-14 10:33:47,658 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 10:34:03,213 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 10:34:11,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2611920.0, ans=0.2 2024-08-14 10:34:16,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2611920.0, ans=0.0 2024-08-14 10:34:29,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 350, loss[loss=0.09852, beats_loss=0.01213, ecapa_loss=0.0001719, whisper_loss=0.08468, over 22118.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01049, ecapa_loss=0.000156, whisper_loss=0.08839, over 3128657.50 frames. ], batch size: 92, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:34:31,332 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 10:35:05,883 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.641e+00 2024-08-14 10:35:20,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2612320.0, ans=0.0 2024-08-14 10:35:20,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2612320.0, ans=0.0 2024-08-14 10:35:53,730 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 400, loss[loss=0.09633, beats_loss=0.01086, ecapa_loss=0.0001472, whisper_loss=0.084, over 17142.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001553, whisper_loss=0.08939, over 3306646.28 frames. ], batch size: 66, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:36:01,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.324e+01 2.549e+01 2.797e+01 3.225e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-14 10:36:08,380 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-14 10:36:35,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 10:36:56,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2612820.0, ans=0.125 2024-08-14 10:37:08,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2612920.0, ans=0.0 2024-08-14 10:37:17,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 450, loss[loss=0.09937, beats_loss=0.0115, ecapa_loss=0.0001295, whisper_loss=0.08657, over 19992.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001552, whisper_loss=0.08927, over 3430868.69 frames. ], batch size: 76, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:37:28,715 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 10:37:51,657 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 10:38:10,289 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 10:38:22,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2613320.0, ans=0.05 2024-08-14 10:38:34,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2613420.0, ans=0.0 2024-08-14 10:38:40,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2613420.0, ans=0.2 2024-08-14 10:38:47,405 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 500, loss[loss=0.07617, beats_loss=0.009102, ecapa_loss=0.0002001, whisper_loss=0.06506, over 15773.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01062, ecapa_loss=0.0001543, whisper_loss=0.08841, over 3514450.57 frames. ], batch size: 62, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:38:50,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-08-14 10:38:56,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.547e+01 2.928e+01 5.420e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-14 10:39:09,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2613620.0, ans=0.0 2024-08-14 10:39:32,789 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 10:39:35,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2613720.0, ans=0.125 2024-08-14 10:39:44,258 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 10:39:52,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2613820.0, ans=0.125 2024-08-14 10:40:07,212 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 39 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 10:40:08,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2613920.0, ans=0.125 2024-08-14 10:40:13,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2613920.0, ans=0.125 2024-08-14 10:40:18,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 550, loss[loss=0.09415, beats_loss=0.01018, ecapa_loss=0.0002243, whisper_loss=0.08172, over 15987.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01064, ecapa_loss=0.0001546, whisper_loss=0.08834, over 3578068.91 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:40:27,478 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 16 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-14 10:40:33,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2614020.0, ans=0.125 2024-08-14 10:40:34,617 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 10:40:38,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2614120.0, ans=0.125 2024-08-14 10:40:47,275 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 10:40:57,790 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 10:41:46,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 600, loss[loss=0.08154, beats_loss=0.009646, ecapa_loss=0.0001772, whisper_loss=0.07012, over 18173.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01066, ecapa_loss=0.0001546, whisper_loss=0.08791, over 3625040.47 frames. ], batch size: 75, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:41:53,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.288e+01 2.520e+01 2.805e+01 9.045e+01, threshold=5.041e+01, percent-clipped=2.0 2024-08-14 10:41:54,066 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 10:42:06,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2614620.0, ans=0.1 2024-08-14 10:42:11,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-14 10:42:21,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2614720.0, ans=0.125 2024-08-14 10:42:21,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2614720.0, ans=0.125 2024-08-14 10:42:24,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2614720.0, ans=0.125 2024-08-14 10:42:40,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2614820.0, ans=0.125 2024-08-14 10:42:40,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2614820.0, ans=0.07 2024-08-14 10:43:03,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 650, loss[loss=0.1165, beats_loss=0.01097, ecapa_loss=0.000156, whisper_loss=0.104, over 22410.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001543, whisper_loss=0.08902, over 3662807.16 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:43:16,013 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.155e+01 2024-08-14 10:43:17,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2615120.0, ans=0.05 2024-08-14 10:43:25,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2615120.0, ans=0.0 2024-08-14 10:43:32,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2615220.0, ans=0.1 2024-08-14 10:43:40,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2615220.0, ans=0.0 2024-08-14 10:43:46,751 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 10:43:57,799 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-14 10:44:07,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2615420.0, ans=0.125 2024-08-14 10:44:13,294 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 700, loss[loss=0.1101, beats_loss=0.007846, ecapa_loss=0.000131, whisper_loss=0.101, over 16904.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001549, whisper_loss=0.0897, over 3708591.22 frames. ], batch size: 61, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:44:19,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.455e+01 2.625e+01 2.898e+01 4.319e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-14 10:44:25,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=2615620.0, ans=0.1 2024-08-14 10:44:28,024 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 10:44:30,657 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 10:45:01,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2615820.0, ans=0.1 2024-08-14 10:45:05,531 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 10:45:07,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2615920.0, ans=0.2 2024-08-14 10:45:08,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2615920.0, ans=0.1 2024-08-14 10:45:14,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2024-08-14 10:45:15,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2615920.0, ans=0.125 2024-08-14 10:45:19,245 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 21 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-14 10:45:20,522 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 750, loss[loss=0.1217, beats_loss=0.008939, ecapa_loss=0.0001593, whisper_loss=0.1112, over 13885.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001552, whisper_loss=0.08938, over 3721461.22 frames. ], batch size: 53, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:45:27,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2616020.0, ans=0.025 2024-08-14 10:45:31,170 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 10:46:00,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-14 10:46:22,400 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 10:46:27,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 800, loss[loss=0.1065, beats_loss=0.01, ecapa_loss=0.000123, whisper_loss=0.09525, over 19731.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001545, whisper_loss=0.08907, over 3727672.53 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:46:28,861 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 10:46:33,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.304e+01 2.552e+01 2.845e+01 4.485e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 10:46:49,857 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-14 10:46:58,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2616720.0, ans=0.1 2024-08-14 10:46:59,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2616720.0, ans=0.125 2024-08-14 10:47:32,141 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 10:47:32,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2616920.0, ans=0.125 2024-08-14 10:47:33,475 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 10:47:34,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 850, loss[loss=0.09472, beats_loss=0.01096, ecapa_loss=0.0001391, whisper_loss=0.08236, over 13914.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01057, ecapa_loss=0.000154, whisper_loss=0.08874, over 3737492.55 frames. ], batch size: 53, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:47:43,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2617020.0, ans=0.125 2024-08-14 10:47:48,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2617120.0, ans=0.125 2024-08-14 10:47:49,651 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 10:47:50,934 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 10:47:53,573 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 6 from Vox, 30 fro AS 2024-08-14 10:48:27,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2617420.0, ans=0.125 2024-08-14 10:48:41,281 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 10:48:42,399 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 900, loss[loss=0.09017, beats_loss=0.01271, ecapa_loss=0.0001183, whisper_loss=0.07627, over 14154.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001534, whisper_loss=0.08871, over 3762473.71 frames. ], batch size: 55, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:48:42,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2617520.0, ans=0.2 2024-08-14 10:48:49,482 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.308e+01 2.548e+01 2.901e+01 4.285e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-14 10:48:50,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2617520.0, ans=0.125 2024-08-14 10:48:56,180 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 10:48:57,485 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 10:48:57,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2617620.0, ans=0.0 2024-08-14 10:49:23,850 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 10:49:24,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-14 10:49:31,078 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 10:49:45,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2617920.0, ans=0.07 2024-08-14 10:49:48,517 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 10:49:49,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 950, loss[loss=0.1183, beats_loss=0.008294, ecapa_loss=0.0001503, whisper_loss=0.1085, over 15478.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001533, whisper_loss=0.08908, over 3762327.69 frames. ], batch size: 57, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:49:59,204 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 10:50:14,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2618120.0, ans=0.125 2024-08-14 10:50:46,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2618420.0, ans=0.2 2024-08-14 10:50:55,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2618520.0, ans=0.2 2024-08-14 10:50:57,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1000, loss[loss=0.09544, beats_loss=0.01062, ecapa_loss=0.000158, whisper_loss=0.08323, over 20457.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001537, whisper_loss=0.08976, over 3781226.67 frames. ], batch size: 83, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:51:03,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.412e+01 2.681e+01 3.043e+01 1.164e+02, threshold=5.362e+01, percent-clipped=2.0 2024-08-14 10:51:09,075 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 10:51:17,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-14 10:51:21,002 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 10:51:31,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2618720.0, ans=0.0 2024-08-14 10:51:38,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2618820.0, ans=0.1 2024-08-14 10:51:46,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2618820.0, ans=0.125 2024-08-14 10:51:48,279 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.03 vs. limit=10.0 2024-08-14 10:51:49,083 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 10:52:03,788 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1050, loss[loss=0.08775, beats_loss=0.009948, ecapa_loss=0.0001754, whisper_loss=0.07604, over 15845.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001547, whisper_loss=0.08926, over 3782951.15 frames. ], batch size: 64, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:52:33,186 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 10:52:41,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2619220.0, ans=0.125 2024-08-14 10:52:43,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2619320.0, ans=0.125 2024-08-14 10:52:43,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-08-14 10:52:45,579 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 10:52:54,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-14 10:52:57,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2619420.0, ans=0.125 2024-08-14 10:53:08,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2619420.0, ans=0.125 2024-08-14 10:53:11,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1100, loss[loss=0.1031, beats_loss=0.01115, ecapa_loss=0.0001478, whisper_loss=0.09043, over 16958.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001544, whisper_loss=0.08988, over 3752278.84 frames. ], batch size: 68, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:53:12,678 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 10:53:17,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.365e+01 2.665e+01 2.962e+01 1.430e+02, threshold=5.329e+01, percent-clipped=2.0 2024-08-14 10:53:19,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2619520.0, ans=0.1 2024-08-14 10:53:37,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-14 10:53:42,819 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 10:53:53,290 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 10:54:01,296 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 10:54:03,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2024-08-14 10:54:12,527 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-14 10:54:13,851 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 10:54:17,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1150, loss[loss=0.1128, beats_loss=0.01032, ecapa_loss=0.0001889, whisper_loss=0.1005, over 22743.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001543, whisper_loss=0.08936, over 3753860.65 frames. ], batch size: 92, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:54:25,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-14 10:54:40,518 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 10:54:44,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2620220.0, ans=0.125 2024-08-14 10:54:58,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2024-08-14 10:55:08,980 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 10:55:10,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2620420.0, ans=0.125 2024-08-14 10:55:11,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2620420.0, ans=10.0 2024-08-14 10:55:24,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1200, loss[loss=0.08994, beats_loss=0.0111, ecapa_loss=0.000136, whisper_loss=0.07748, over 20467.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001534, whisper_loss=0.08963, over 3748749.20 frames. ], batch size: 80, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:55:31,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.349e+01 2.616e+01 2.854e+01 5.362e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 10:55:41,185 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 10:55:56,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-14 10:56:13,836 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 10:56:26,746 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 10 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 10:56:31,872 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1250, loss[loss=0.1014, beats_loss=0.009791, ecapa_loss=0.0001551, whisper_loss=0.09009, over 20508.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001516, whisper_loss=0.08944, over 3743599.76 frames. ], batch size: 81, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:56:39,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2621020.0, ans=0.2 2024-08-14 10:56:43,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2621020.0, ans=0.125 2024-08-14 10:56:44,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2621120.0, ans=0.0 2024-08-14 10:56:50,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2621120.0, ans=0.125 2024-08-14 10:56:52,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2621120.0, ans=0.125 2024-08-14 10:56:52,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2621120.0, ans=0.1 2024-08-14 10:56:59,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2621220.0, ans=0.0 2024-08-14 10:57:05,808 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 10:57:12,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.23 vs. limit=22.5 2024-08-14 10:57:16,725 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-14 10:57:21,808 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 10:57:28,502 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 10:57:32,458 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 10:57:39,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1300, loss[loss=0.1327, beats_loss=0.005717, ecapa_loss=0.0002009, whisper_loss=0.125, over 17206.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001521, whisper_loss=0.09064, over 3750998.15 frames. ], batch size: 67, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:57:45,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.286e+01 2.497e+01 2.754e+01 3.684e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 10:57:52,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2621620.0, ans=0.125 2024-08-14 10:58:03,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.33 vs. limit=10.0 2024-08-14 10:58:18,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2621820.0, ans=0.125 2024-08-14 10:58:21,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2621820.0, ans=0.1 2024-08-14 10:58:21,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2621820.0, ans=0.0 2024-08-14 10:58:32,064 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 10:58:34,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2621920.0, ans=0.125 2024-08-14 10:58:45,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2622020.0, ans=0.0 2024-08-14 10:58:46,615 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1350, loss[loss=0.1045, beats_loss=0.01021, ecapa_loss=0.0001622, whisper_loss=0.09271, over 22827.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001514, whisper_loss=0.09007, over 3763366.81 frames. ], batch size: 93, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:59:09,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2622120.0, ans=0.1 2024-08-14 10:59:20,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2622220.0, ans=0.2 2024-08-14 10:59:26,677 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 10:59:45,591 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 15 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-14 10:59:49,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2622420.0, ans=0.125 2024-08-14 10:59:52,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2622520.0, ans=0.125 2024-08-14 10:59:53,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1400, loss[loss=0.08586, beats_loss=0.01109, ecapa_loss=0.0001561, whisper_loss=0.0732, over 21696.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001524, whisper_loss=0.09019, over 3804597.00 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:59:54,826 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 10:59:59,982 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.361e+01 2.575e+01 2.810e+01 4.774e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 11:00:01,645 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 11:00:11,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2622620.0, ans=0.125 2024-08-14 11:00:15,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2622620.0, ans=0.125 2024-08-14 11:00:22,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2622720.0, ans=0.0 2024-08-14 11:00:31,846 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.870e+05 2024-08-14 11:00:52,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2622920.0, ans=0.125 2024-08-14 11:00:56,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2622920.0, ans=0.125 2024-08-14 11:00:59,957 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1450, loss[loss=0.1166, beats_loss=0.01141, ecapa_loss=0.0001327, whisper_loss=0.1038, over 21260.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001519, whisper_loss=0.08967, over 3800281.34 frames. ], batch size: 83, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:01:14,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2623020.0, ans=0.1 2024-08-14 11:01:19,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=15.0 2024-08-14 11:01:21,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-14 11:01:24,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2623020.0, ans=0.2 2024-08-14 11:01:31,191 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 11:01:45,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2623220.0, ans=0.2 2024-08-14 11:01:55,442 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-14 11:01:57,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2623320.0, ans=0.125 2024-08-14 11:02:06,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2623320.0, ans=10.0 2024-08-14 11:02:23,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1500, loss[loss=0.09975, beats_loss=0.00995, ecapa_loss=0.0001287, whisper_loss=0.08852, over 19009.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001525, whisper_loss=0.08951, over 3803382.53 frames. ], batch size: 71, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:02:28,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2623520.0, ans=0.0 2024-08-14 11:02:28,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2623520.0, ans=0.0 2024-08-14 11:02:30,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.366e+01 2.617e+01 2.967e+01 6.359e+01, threshold=5.234e+01, percent-clipped=3.0 2024-08-14 11:02:41,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2623620.0, ans=0.125 2024-08-14 11:02:43,901 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 11:03:05,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2623720.0, ans=0.0 2024-08-14 11:03:08,109 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 11:03:09,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2623820.0, ans=0.2 2024-08-14 11:03:21,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2623920.0, ans=0.0 2024-08-14 11:03:31,304 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:03:31,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-14 11:03:35,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-14 11:03:36,642 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 11:03:37,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1550, loss[loss=0.1018, beats_loss=0.008898, ecapa_loss=0.0002014, whisper_loss=0.09086, over 19587.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001525, whisper_loss=0.09006, over 3801755.56 frames. ], batch size: 81, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:03:58,999 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-14 11:04:01,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2624120.0, ans=0.1 2024-08-14 11:04:17,144 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 11:04:18,397 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 11:04:29,432 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 11:04:35,347 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:04:38,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2024-08-14 11:04:46,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2624420.0, ans=0.125 2024-08-14 11:04:54,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1600, loss[loss=0.1018, beats_loss=0.009448, ecapa_loss=0.000189, whisper_loss=0.09042, over 20049.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001518, whisper_loss=0.09001, over 3832223.49 frames. ], batch size: 82, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:05:01,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.369e+01 2.524e+01 2.843e+01 4.192e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 11:05:12,862 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 16 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 11:05:27,545 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 11:05:45,079 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 11:05:48,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2624820.0, ans=0.125 2024-08-14 11:06:09,997 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1650, loss[loss=0.1277, beats_loss=0.008562, ecapa_loss=0.0001829, whisper_loss=0.1173, over 22473.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001524, whisper_loss=0.09094, over 3822647.14 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:06:11,429 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 11:06:13,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2625020.0, ans=0.0 2024-08-14 11:06:14,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2625020.0, ans=0.125 2024-08-14 11:06:19,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2625020.0, ans=0.1 2024-08-14 11:06:36,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2625120.0, ans=0.125 2024-08-14 11:06:38,581 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 11:06:52,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2625220.0, ans=0.2 2024-08-14 11:06:52,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2625220.0, ans=0.125 2024-08-14 11:07:01,796 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 11:07:05,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2625320.0, ans=0.2 2024-08-14 11:07:14,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2625420.0, ans=0.2 2024-08-14 11:07:25,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1700, loss[loss=0.09768, beats_loss=0.008476, ecapa_loss=0.0001203, whisper_loss=0.088, over 15225.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01043, ecapa_loss=0.0001523, whisper_loss=0.09167, over 3813638.05 frames. ], batch size: 56, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:07:26,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-08-14 11:07:32,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.263e+01 2.524e+01 2.794e+01 4.972e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 11:08:00,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2625720.0, ans=0.1 2024-08-14 11:08:05,764 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 11:08:12,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2625820.0, ans=0.1 2024-08-14 11:08:18,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2625820.0, ans=0.1 2024-08-14 11:08:20,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2625820.0, ans=0.1 2024-08-14 11:08:41,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1750, loss[loss=0.09238, beats_loss=0.0105, ecapa_loss=0.0001141, whisper_loss=0.08074, over 15737.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001522, whisper_loss=0.09129, over 3826057.47 frames. ], batch size: 57, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:08:44,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2626020.0, ans=0.125 2024-08-14 11:08:52,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2024-08-14 11:08:58,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2626120.0, ans=0.125 2024-08-14 11:09:01,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-08-14 11:09:02,682 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 11:09:05,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2626120.0, ans=0.125 2024-08-14 11:09:24,494 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 11:09:24,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2626320.0, ans=0.125 2024-08-14 11:09:27,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2626320.0, ans=0.0 2024-08-14 11:09:33,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2626320.0, ans=0.125 2024-08-14 11:09:33,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2024-08-14 11:09:55,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1800, loss[loss=0.09156, beats_loss=0.01303, ecapa_loss=0.0001543, whisper_loss=0.07698, over 21523.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001513, whisper_loss=0.09, over 3827941.83 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:10:00,574 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 11:10:05,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.260e+01 2.552e+01 2.816e+01 4.964e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 11:10:24,555 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 11:10:34,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2626720.0, ans=0.2 2024-08-14 11:10:38,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2626720.0, ans=0.125 2024-08-14 11:10:51,274 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=22.5 2024-08-14 11:11:08,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2626920.0, ans=0.0 2024-08-14 11:11:09,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2627020.0, ans=0.125 2024-08-14 11:11:11,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1850, loss[loss=0.1053, beats_loss=0.01027, ecapa_loss=0.0001563, whisper_loss=0.09348, over 18339.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001518, whisper_loss=0.09033, over 3818456.92 frames. ], batch size: 71, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:11:16,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2627020.0, ans=0.0 2024-08-14 11:11:35,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2627120.0, ans=0.125 2024-08-14 11:12:04,815 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 11:12:07,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2627320.0, ans=0.1 2024-08-14 11:12:11,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2627420.0, ans=0.2 2024-08-14 11:12:23,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.19 vs. limit=22.5 2024-08-14 11:12:25,099 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1900, loss[loss=0.103, beats_loss=0.01327, ecapa_loss=0.0001251, whisper_loss=0.08845, over 23189.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001515, whisper_loss=0.08995, over 3797855.02 frames. ], batch size: 92, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:12:25,280 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 11:12:29,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2627520.0, ans=0.1 2024-08-14 11:12:29,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2627520.0, ans=0.2 2024-08-14 11:12:33,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.313e+01 2.505e+01 2.769e+01 4.411e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-14 11:13:27,562 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 11:13:36,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2627920.0, ans=15.0 2024-08-14 11:13:39,286 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 1950, loss[loss=0.09772, beats_loss=0.01116, ecapa_loss=0.0001589, whisper_loss=0.08497, over 22899.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001513, whisper_loss=0.08993, over 3798765.72 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:13:44,407 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 11:13:45,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2628020.0, ans=0.125 2024-08-14 11:14:18,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2628220.0, ans=0.125 2024-08-14 11:14:31,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2628320.0, ans=0.05 2024-08-14 11:14:34,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-14 11:14:44,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2628420.0, ans=0.0 2024-08-14 11:14:54,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2000, loss[loss=0.08413, beats_loss=0.01035, ecapa_loss=0.0001609, whisper_loss=0.07216, over 13452.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001518, whisper_loss=0.09011, over 3788567.02 frames. ], batch size: 53, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:14:59,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2628520.0, ans=0.025 2024-08-14 11:15:04,506 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.355e+01 2.639e+01 2.929e+01 2.426e+02, threshold=5.277e+01, percent-clipped=2.0 2024-08-14 11:15:27,664 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 11:15:31,925 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 11:16:10,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2628920.0, ans=0.125 2024-08-14 11:16:12,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2050, loss[loss=0.09803, beats_loss=0.0126, ecapa_loss=0.0001577, whisper_loss=0.08385, over 23453.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001517, whisper_loss=0.08974, over 3790527.90 frames. ], batch size: 93, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:16:20,300 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 11:16:35,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2629120.0, ans=0.1 2024-08-14 11:16:37,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2024-08-14 11:16:38,405 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 11:16:53,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2629220.0, ans=0.125 2024-08-14 11:17:09,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2629320.0, ans=0.125 2024-08-14 11:17:30,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2100, loss[loss=0.09508, beats_loss=0.01046, ecapa_loss=0.0001175, whisper_loss=0.08344, over 17819.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.0001504, whisper_loss=0.08918, over 3778900.26 frames. ], batch size: 66, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:17:37,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2629520.0, ans=0.025 2024-08-14 11:17:39,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.267e+01 2.457e+01 2.784e+01 3.709e+01, threshold=4.913e+01, percent-clipped=0.0 2024-08-14 11:17:59,308 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09374744445085526, model_norm_threshold=49.13302230834961 2024-08-14 11:17:59,510 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.336e+04, grad_sumsq=7.336e+04, orig_rms_sq=1.000e+00 2024-08-14 11:18:00,601 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 11:18:07,948 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 11:18:22,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2629820.0, ans=0.2 2024-08-14 11:18:30,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2629820.0, ans=0.0 2024-08-14 11:18:32,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-08-14 11:18:33,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-14 11:18:42,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2629920.0, ans=0.05 2024-08-14 11:18:49,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2150, loss[loss=0.125, beats_loss=0.009135, ecapa_loss=0.0001287, whisper_loss=0.1146, over 17459.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001493, whisper_loss=0.08979, over 3805646.34 frames. ], batch size: 63, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:18:51,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2630020.0, ans=0.0 2024-08-14 11:19:14,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2630120.0, ans=0.125 2024-08-14 11:19:22,899 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 11:19:26,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2630220.0, ans=0.125 2024-08-14 11:19:30,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2630220.0, ans=0.125 2024-08-14 11:19:32,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2630220.0, ans=0.125 2024-08-14 11:19:37,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=12.0 2024-08-14 11:19:51,550 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 11:19:53,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2630420.0, ans=0.125 2024-08-14 11:20:08,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2200, loss[loss=0.1182, beats_loss=0.00905, ecapa_loss=0.0001822, whisper_loss=0.1074, over 20855.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001502, whisper_loss=0.09048, over 3833752.28 frames. ], batch size: 85, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:20:17,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.405e+01 2.616e+01 2.970e+01 5.241e+02, threshold=5.232e+01, percent-clipped=2.0 2024-08-14 11:20:25,971 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 11:20:26,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2630620.0, ans=0.09899494936611666 2024-08-14 11:20:56,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2630820.0, ans=0.125 2024-08-14 11:20:59,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2630820.0, ans=0.125 2024-08-14 11:21:04,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2630820.0, ans=0.125 2024-08-14 11:21:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2630920.0, ans=0.0 2024-08-14 11:21:23,528 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 11:21:29,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2250, loss[loss=0.09794, beats_loss=0.01301, ecapa_loss=0.0001421, whisper_loss=0.08351, over 22815.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0108, ecapa_loss=0.0001512, whisper_loss=0.09055, over 3844857.14 frames. ], batch size: 95, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:21:30,824 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:21:34,785 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 11:22:00,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2631220.0, ans=0.1 2024-08-14 11:22:03,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2631220.0, ans=0.05 2024-08-14 11:22:05,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2024-08-14 11:22:13,937 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 11:22:48,213 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 11:22:49,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2300, loss[loss=0.09914, beats_loss=0.01232, ecapa_loss=0.0001365, whisper_loss=0.08545, over 21005.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01092, ecapa_loss=0.0001518, whisper_loss=0.09013, over 3842019.66 frames. ], batch size: 84, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:22:58,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.348e+01 2.609e+01 2.849e+01 2.533e+02, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 11:23:28,943 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 11:23:42,735 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 11:23:50,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2631820.0, ans=0.125 2024-08-14 11:24:02,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2631920.0, ans=0.125 2024-08-14 11:24:08,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2350, loss[loss=0.1166, beats_loss=0.008997, ecapa_loss=0.0001331, whisper_loss=0.1063, over 14447.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01087, ecapa_loss=0.000151, whisper_loss=0.09034, over 3814640.00 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:24:15,199 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 11:24:20,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2632020.0, ans=0.2 2024-08-14 11:24:43,333 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 11:25:29,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2400, loss[loss=0.112, beats_loss=0.009983, ecapa_loss=0.0001743, whisper_loss=0.1002, over 22689.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001521, whisper_loss=0.09073, over 3853848.16 frames. ], batch size: 89, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:25:34,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2632520.0, ans=0.125 2024-08-14 11:25:35,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.23 vs. limit=6.0 2024-08-14 11:25:35,617 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 11:25:38,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.318e+01 2.574e+01 2.948e+01 5.851e+01, threshold=5.149e+01, percent-clipped=1.0 2024-08-14 11:25:42,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2632520.0, ans=0.125 2024-08-14 11:25:46,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-14 11:25:49,202 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 19 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 11:26:14,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2632820.0, ans=0.125 2024-08-14 11:26:22,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2632820.0, ans=0.04949747468305833 2024-08-14 11:26:33,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2632920.0, ans=0.125 2024-08-14 11:26:38,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2024-08-14 11:26:46,785 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2450, loss[loss=0.09232, beats_loss=0.01098, ecapa_loss=0.0001658, whisper_loss=0.07967, over 19080.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.0001521, whisper_loss=0.0899, over 3843535.62 frames. ], batch size: 80, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:26:49,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-14 11:27:08,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2633120.0, ans=0.5 2024-08-14 11:27:19,130 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 11:27:22,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2633220.0, ans=0.5 2024-08-14 11:27:25,310 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 11:27:30,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2633220.0, ans=0.0 2024-08-14 11:27:31,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2633320.0, ans=0.2 2024-08-14 11:27:35,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2633320.0, ans=10.0 2024-08-14 11:28:03,466 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2500, loss[loss=0.1103, beats_loss=0.01015, ecapa_loss=0.0001292, whisper_loss=0.09886, over 20710.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0001524, whisper_loss=0.08985, over 3835371.76 frames. ], batch size: 79, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:28:12,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.237e+01 2.442e+01 2.717e+01 4.288e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-14 11:28:23,612 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 11:28:23,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2633620.0, ans=0.1 2024-08-14 11:28:28,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2633620.0, ans=0.125 2024-08-14 11:28:31,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2633620.0, ans=0.1 2024-08-14 11:28:33,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2633720.0, ans=0.5 2024-08-14 11:28:46,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=12.0 2024-08-14 11:28:57,931 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 11:29:20,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2550, loss[loss=0.1208, beats_loss=0.007132, ecapa_loss=0.0001507, whisper_loss=0.1122, over 18678.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001526, whisper_loss=0.09026, over 3844668.69 frames. ], batch size: 70, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:30:07,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2634320.0, ans=0.125 2024-08-14 11:30:10,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2634320.0, ans=0.09899494936611666 2024-08-14 11:30:20,845 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 11:30:30,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-08-14 11:30:40,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2600, loss[loss=0.1099, beats_loss=0.01213, ecapa_loss=0.0001484, whisper_loss=0.09632, over 20944.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.0001529, whisper_loss=0.08991, over 3858755.90 frames. ], batch size: 82, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:30:49,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.475e+01 2.751e+01 3.111e+01 1.109e+02, threshold=5.502e+01, percent-clipped=3.0 2024-08-14 11:31:07,283 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:31:07,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2634620.0, ans=0.125 2024-08-14 11:31:17,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2634720.0, ans=0.2 2024-08-14 11:31:27,908 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-14 11:31:46,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2634920.0, ans=0.0 2024-08-14 11:31:55,137 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 11:31:58,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2650, loss[loss=0.111, beats_loss=0.01169, ecapa_loss=0.0001306, whisper_loss=0.09802, over 23059.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001521, whisper_loss=0.09007, over 3887334.90 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:32:15,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-08-14 11:32:21,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=12.0 2024-08-14 11:32:27,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2635220.0, ans=0.1 2024-08-14 11:32:54,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-08-14 11:33:05,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-14 11:33:13,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2700, loss[loss=0.09439, beats_loss=0.01233, ecapa_loss=0.0001502, whisper_loss=0.08056, over 18711.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001525, whisper_loss=0.08981, over 3867878.85 frames. ], batch size: 72, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:33:22,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.370e+01 2.672e+01 3.079e+01 4.287e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 11:33:32,726 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 11:33:34,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2635620.0, ans=0.125 2024-08-14 11:34:06,861 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 11:34:21,561 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 11:34:28,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2635920.0, ans=0.125 2024-08-14 11:34:37,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2750, loss[loss=0.1096, beats_loss=0.007164, ecapa_loss=0.000211, whisper_loss=0.1004, over 13550.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001532, whisper_loss=0.08989, over 3836495.12 frames. ], batch size: 53, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:35:06,354 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 11:35:22,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-14 11:36:07,548 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2800, loss[loss=0.09876, beats_loss=0.01139, ecapa_loss=0.0001174, whisper_loss=0.08619, over 15000.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001528, whisper_loss=0.09021, over 3843266.39 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:36:18,858 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:36:19,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.323e+01 2.596e+01 2.984e+01 3.829e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-14 11:36:31,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2636620.0, ans=0.1 2024-08-14 11:36:34,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2636620.0, ans=0.0 2024-08-14 11:36:59,750 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 40 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 11:37:18,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-14 11:37:48,252 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2850, loss[loss=0.08466, beats_loss=0.01353, ecapa_loss=0.0001555, whisper_loss=0.06957, over 18457.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001538, whisper_loss=0.09083, over 3850080.48 frames. ], batch size: 75, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:37:48,469 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 11:37:54,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2637020.0, ans=0.1 2024-08-14 11:37:58,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-14 11:38:23,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2637120.0, ans=0.035 2024-08-14 11:38:27,434 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 11:38:32,808 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 11:38:36,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-14 11:38:44,241 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 11:38:50,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2637220.0, ans=0.125 2024-08-14 11:38:56,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-14 11:39:08,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2637320.0, ans=0.125 2024-08-14 11:39:39,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-14 11:39:48,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.48 vs. limit=6.0 2024-08-14 11:39:51,061 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2900, loss[loss=0.1026, beats_loss=0.01021, ecapa_loss=0.0001475, whisper_loss=0.09087, over 19489.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.000154, whisper_loss=0.09077, over 3880420.08 frames. ], batch size: 73, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:40:05,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.270e+01 2.545e+01 2.877e+01 7.977e+01, threshold=5.090e+01, percent-clipped=2.0 2024-08-14 11:40:53,782 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 11:41:11,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-08-14 11:41:39,959 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-14 11:41:47,617 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 11:41:48,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2637920.0, ans=10.0 2024-08-14 11:41:56,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=12.0 2024-08-14 11:41:57,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 2950, loss[loss=0.07937, beats_loss=0.01261, ecapa_loss=0.0001194, whisper_loss=0.06556, over 15723.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001543, whisper_loss=0.09025, over 3882205.75 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:42:06,406 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 11:42:31,931 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 11:42:59,059 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.408e-02 2024-08-14 11:43:05,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2638220.0, ans=0.125 2024-08-14 11:43:11,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2638320.0, ans=0.0 2024-08-14 11:43:52,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2638420.0, ans=0.0 2024-08-14 11:43:56,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3000, loss[loss=0.1053, beats_loss=0.009618, ecapa_loss=0.0001384, whisper_loss=0.09434, over 18399.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001545, whisper_loss=0.09075, over 3894578.94 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:43:56,109 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 11:44:34,397 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005472, whisper_loss=0.2471, over 922467.00 frames. 2024-08-14 11:44:51,394 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on SV_voxceleb1: loss=0.00425, beats_loss=0, ecapa_loss=0.000425, whisper_loss=0, over 939242.00 frames. 2024-08-14 11:45:55,692 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8876, 1.8391, 1.9896, 1.8850, 2.4680, 1.9036, 2.0142, 1.8991], device='cuda:2') 2024-08-14 11:46:48,046 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 11:46:48,050 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 11:46:51,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2638520.0, ans=0.125 2024-08-14 11:46:54,485 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 11:46:57,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.515e+01 2.846e+01 3.137e+01 6.212e+01, threshold=5.693e+01, percent-clipped=1.0 2024-08-14 11:47:04,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2638620.0, ans=0.125 2024-08-14 11:47:28,663 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 11:47:34,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2638720.0, ans=0.2 2024-08-14 11:47:38,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2638820.0, ans=0.125 2024-08-14 11:47:43,074 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 11:47:44,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2638820.0, ans=0.0 2024-08-14 11:47:44,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2638820.0, ans=0.1 2024-08-14 11:47:50,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2638920.0, ans=0.0 2024-08-14 11:47:58,571 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-14 11:47:59,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2024-08-14 11:48:07,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3050, loss[loss=0.1148, beats_loss=0.01109, ecapa_loss=0.0001315, whisper_loss=0.1024, over 23452.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001539, whisper_loss=0.09109, over 3887519.70 frames. ], batch size: 92, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:48:07,794 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 37 from Vox, 32 fro AS 2024-08-14 11:48:27,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=12.0 2024-08-14 11:48:43,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.12 vs. limit=10.0 2024-08-14 11:48:47,475 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 11:49:05,121 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 23 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 11:49:23,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2639420.0, ans=0.125 2024-08-14 11:49:29,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2639520.0, ans=0.125 2024-08-14 11:49:30,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3100, loss[loss=0.109, beats_loss=0.009209, ecapa_loss=0.0001911, whisper_loss=0.0979, over 20125.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.000155, whisper_loss=0.09155, over 3893323.15 frames. ], batch size: 83, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:49:32,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2639520.0, ans=0.0 2024-08-14 11:49:39,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.342e+01 2.628e+01 3.036e+01 4.820e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-14 11:49:52,035 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 11:50:06,275 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 11:50:09,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2639720.0, ans=0.2 2024-08-14 11:50:16,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2639820.0, ans=0.125 2024-08-14 11:50:39,661 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 11:50:41,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2639920.0, ans=0.125 2024-08-14 11:50:49,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3150, loss[loss=0.1251, beats_loss=0.008066, ecapa_loss=0.000179, whisper_loss=0.1152, over 21860.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001564, whisper_loss=0.09164, over 3884977.66 frames. ], batch size: 88, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:51:21,165 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 11:51:35,944 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 11:51:36,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2640320.0, ans=0.0 2024-08-14 11:51:50,710 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 11:51:55,055 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 11:52:04,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2640420.0, ans=0.0 2024-08-14 11:52:06,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3200, loss[loss=0.09896, beats_loss=0.009475, ecapa_loss=0.0001504, whisper_loss=0.08798, over 16730.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01057, ecapa_loss=0.0001573, whisper_loss=0.0925, over 3880272.50 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:52:16,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.359e+01 2.591e+01 2.913e+01 5.020e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-14 11:52:24,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2640620.0, ans=0.0 2024-08-14 11:52:34,603 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2024-08-14 11:52:54,867 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 11:53:06,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2640820.0, ans=0.1 2024-08-14 11:53:16,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2640920.0, ans=0.0 2024-08-14 11:53:22,991 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3250, loss[loss=0.1104, beats_loss=0.01056, ecapa_loss=0.0001756, whisper_loss=0.09805, over 19557.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01058, ecapa_loss=0.0001567, whisper_loss=0.09267, over 3864801.06 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:53:23,276 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 11:53:24,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-14 11:53:55,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-14 11:53:56,369 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 11:54:07,293 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 11:54:16,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-14 11:54:18,753 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 10 from Vox, 42 fro AS 2024-08-14 11:54:29,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2641420.0, ans=0.125 2024-08-14 11:54:34,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2641420.0, ans=0.2 2024-08-14 11:54:43,982 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3300, loss[loss=0.1012, beats_loss=0.01073, ecapa_loss=0.0001429, whisper_loss=0.08902, over 22161.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001561, whisper_loss=0.09184, over 3896234.66 frames. ], batch size: 86, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:54:47,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2641520.0, ans=0.1 2024-08-14 11:54:54,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.343e+01 2.686e+01 3.135e+01 1.274e+02, threshold=5.372e+01, percent-clipped=3.0 2024-08-14 11:55:07,254 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 11:55:10,501 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 11:55:10,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2641620.0, ans=0.1 2024-08-14 11:55:18,503 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 11:55:27,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2641720.0, ans=0.2 2024-08-14 11:55:29,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-14 11:55:48,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2641920.0, ans=0.125 2024-08-14 11:55:57,163 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 11:56:04,037 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3350, loss[loss=0.08478, beats_loss=0.01382, ecapa_loss=0.0002169, whisper_loss=0.06879, over 20559.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001563, whisper_loss=0.0911, over 3896532.91 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:56:07,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2642020.0, ans=0.1 2024-08-14 11:56:29,958 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 11:56:37,943 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 11:56:41,380 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 11:57:08,136 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 38 from Vox, 28 fro AS 2024-08-14 11:57:17,475 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 11:57:23,062 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3400, loss[loss=0.1022, beats_loss=0.009785, ecapa_loss=0.0001326, whisper_loss=0.0911, over 17391.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001566, whisper_loss=0.09134, over 3896984.50 frames. ], batch size: 65, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:57:32,911 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 11:57:34,543 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.486e+01 2.836e+01 3.327e+01 1.695e+02, threshold=5.673e+01, percent-clipped=4.0 2024-08-14 11:57:51,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2642620.0, ans=0.0 2024-08-14 11:57:59,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2642720.0, ans=0.1 2024-08-14 11:58:04,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2642720.0, ans=0.1 2024-08-14 11:58:12,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2642820.0, ans=0.0 2024-08-14 11:58:13,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2642820.0, ans=0.1 2024-08-14 11:58:43,975 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3450, loss[loss=0.1124, beats_loss=0.01242, ecapa_loss=0.0001292, whisper_loss=0.09867, over 23272.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001573, whisper_loss=0.09114, over 3911671.36 frames. ], batch size: 90, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:58:51,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2643020.0, ans=0.0 2024-08-14 11:58:52,300 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 11:59:00,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2643120.0, ans=0.0 2024-08-14 11:59:05,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2643120.0, ans=0.04949747468305833 2024-08-14 11:59:22,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2643220.0, ans=0.2 2024-08-14 11:59:38,181 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 11:59:41,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2643320.0, ans=0.125 2024-08-14 11:59:51,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2643420.0, ans=0.125 2024-08-14 11:59:53,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2643420.0, ans=0.125 2024-08-14 11:59:56,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2643420.0, ans=0.0 2024-08-14 11:59:56,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2643420.0, ans=0.0 2024-08-14 12:00:02,637 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:00:03,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3500, loss[loss=0.1333, beats_loss=0.007655, ecapa_loss=0.0001458, whisper_loss=0.1242, over 19941.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.000157, whisper_loss=0.09081, over 3920408.26 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:00:11,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2024-08-14 12:00:16,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.311e+01 2.583e+01 2.814e+01 3.893e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 12:00:43,152 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 12:00:50,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-14 12:00:52,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2643720.0, ans=0.0 2024-08-14 12:00:58,702 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 12:01:03,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2643820.0, ans=0.2 2024-08-14 12:01:08,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2643820.0, ans=0.125 2024-08-14 12:01:14,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.14 vs. limit=10.0 2024-08-14 12:01:21,886 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 12:01:26,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3550, loss[loss=0.09984, beats_loss=0.01083, ecapa_loss=0.0001654, whisper_loss=0.08735, over 22205.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001566, whisper_loss=0.09052, over 3929338.67 frames. ], batch size: 92, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:01:38,698 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 12:01:40,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2644020.0, ans=0.0 2024-08-14 12:01:50,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-14 12:02:14,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2644220.0, ans=0.125 2024-08-14 12:02:20,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2644320.0, ans=0.2 2024-08-14 12:02:29,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2644320.0, ans=0.0 2024-08-14 12:02:29,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=2644320.0, ans=12.0 2024-08-14 12:02:34,975 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 34 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 12:02:44,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2644420.0, ans=0.0 2024-08-14 12:02:51,345 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3600, loss[loss=0.09977, beats_loss=0.01081, ecapa_loss=0.0001568, whisper_loss=0.08739, over 19314.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001569, whisper_loss=0.09158, over 3956082.07 frames. ], batch size: 79, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:02:53,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2644520.0, ans=0.1 2024-08-14 12:03:01,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.481e+01 2.628e+01 2.850e+01 4.421e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-14 12:03:04,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2644520.0, ans=0.1 2024-08-14 12:03:33,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-14 12:04:04,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=12.0 2024-08-14 12:04:08,652 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3650, loss[loss=0.07816, beats_loss=0.01234, ecapa_loss=0.0001151, whisper_loss=0.06467, over 16707.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001573, whisper_loss=0.0916, over 3906213.19 frames. ], batch size: 65, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:04:12,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2645020.0, ans=0.125 2024-08-14 12:04:20,362 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 12:04:24,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2645120.0, ans=0.125 2024-08-14 12:04:25,303 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 12:04:31,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2645120.0, ans=0.1 2024-08-14 12:04:37,623 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 12:05:14,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2645420.0, ans=0.125 2024-08-14 12:05:24,173 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3700, loss[loss=0.105, beats_loss=0.01166, ecapa_loss=0.0001267, whisper_loss=0.0921, over 14703.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001566, whisper_loss=0.09171, over 3876982.33 frames. ], batch size: 56, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:05:25,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2645520.0, ans=0.0 2024-08-14 12:05:27,062 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-14 12:05:33,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.298e+01 2.531e+01 2.738e+01 1.071e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-14 12:05:35,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2645520.0, ans=0.0 2024-08-14 12:05:41,068 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 42 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 12:05:45,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2645620.0, ans=0.1 2024-08-14 12:06:11,626 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:06:13,544 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-14 12:06:35,448 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 12:06:39,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3750, loss[loss=0.0959, beats_loss=0.01122, ecapa_loss=0.0001921, whisper_loss=0.08276, over 20944.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001565, whisper_loss=0.09105, over 3886987.31 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:06:52,741 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 12:07:01,316 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 12:07:24,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2646320.0, ans=0.2 2024-08-14 12:07:56,003 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3800, loss[loss=0.1308, beats_loss=0.008261, ecapa_loss=0.0001626, whisper_loss=0.1209, over 21365.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.000157, whisper_loss=0.09104, over 3899471.21 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:08:05,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2646520.0, ans=0.025 2024-08-14 12:08:05,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.378e+01 2.670e+01 2.953e+01 4.426e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 12:08:09,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2646620.0, ans=0.025 2024-08-14 12:08:14,906 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 12:08:21,040 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 12:08:23,636 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 12:08:24,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-08-14 12:08:40,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=12.0 2024-08-14 12:08:44,445 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 12:08:47,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2646820.0, ans=10.0 2024-08-14 12:08:48,896 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 12:08:49,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2646820.0, ans=0.0 2024-08-14 12:08:58,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2646920.0, ans=0.05 2024-08-14 12:09:14,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3850, loss[loss=0.1142, beats_loss=0.01081, ecapa_loss=0.0001774, whisper_loss=0.1016, over 21967.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001556, whisper_loss=0.09036, over 3875008.44 frames. ], batch size: 90, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:09:15,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2647020.0, ans=0.125 2024-08-14 12:09:38,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2647120.0, ans=0.125 2024-08-14 12:09:54,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-14 12:09:54,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-14 12:09:58,687 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 12:10:24,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2647420.0, ans=0.2 2024-08-14 12:10:31,007 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:10:34,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-08-14 12:10:35,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3900, loss[loss=0.1197, beats_loss=0.009817, ecapa_loss=0.0001716, whisper_loss=0.1082, over 15942.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001563, whisper_loss=0.09007, over 3856811.15 frames. ], batch size: 61, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:10:48,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.360e+01 2.691e+01 2.914e+01 3.544e+02, threshold=5.383e+01, percent-clipped=1.0 2024-08-14 12:10:58,729 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 12:11:15,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2647720.0, ans=0.5 2024-08-14 12:11:18,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2647720.0, ans=0.1 2024-08-14 12:11:20,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2647720.0, ans=0.0 2024-08-14 12:11:40,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2647820.0, ans=0.2 2024-08-14 12:11:47,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-08-14 12:11:58,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2647920.0, ans=0.1 2024-08-14 12:12:05,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 3950, loss[loss=0.1008, beats_loss=0.01132, ecapa_loss=0.0001494, whisper_loss=0.08803, over 19896.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001569, whisper_loss=0.09072, over 3870588.47 frames. ], batch size: 79, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:12:28,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-08-14 12:12:33,138 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 12:12:36,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2648120.0, ans=0.125 2024-08-14 12:12:36,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2024-08-14 12:12:42,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2024-08-14 12:12:56,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2648220.0, ans=0.0 2024-08-14 12:13:25,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2648320.0, ans=0.0 2024-08-14 12:13:28,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2648420.0, ans=0.125 2024-08-14 12:13:29,374 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 12:13:30,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2648420.0, ans=0.05 2024-08-14 12:13:31,256 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 12:13:32,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2648420.0, ans=0.1 2024-08-14 12:13:52,320 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4000, loss[loss=0.108, beats_loss=0.01254, ecapa_loss=0.0001469, whisper_loss=0.09395, over 20740.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001571, whisper_loss=0.09114, over 3877408.61 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:14:06,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2648520.0, ans=0.05 2024-08-14 12:14:07,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.464e+01 2.683e+01 2.941e+01 4.279e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-14 12:14:11,089 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-14 12:14:38,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2648720.0, ans=0.1 2024-08-14 12:14:42,115 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 12:14:47,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2648720.0, ans=0.1 2024-08-14 12:15:15,831 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.257e+00 2024-08-14 12:15:18,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2648820.0, ans=0.1 2024-08-14 12:15:36,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2648920.0, ans=0.0 2024-08-14 12:15:52,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4050, loss[loss=0.09343, beats_loss=0.01257, ecapa_loss=0.0001558, whisper_loss=0.0793, over 20507.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.0001567, whisper_loss=0.09158, over 3917429.75 frames. ], batch size: 84, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:16:12,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2024-08-14 12:16:23,113 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 12:16:32,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2649220.0, ans=0.05 2024-08-14 12:16:43,468 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 12:16:57,314 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 12:16:59,971 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 19 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 12:17:02,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=12.0 2024-08-14 12:17:07,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2649420.0, ans=0.125 2024-08-14 12:17:20,836 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4100, loss[loss=0.1061, beats_loss=0.00895, ecapa_loss=0.0001642, whisper_loss=0.09548, over 21609.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001565, whisper_loss=0.09157, over 3922425.90 frames. ], batch size: 84, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:17:33,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.297e+01 2.541e+01 2.897e+01 6.382e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-14 12:18:19,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2649820.0, ans=0.04949747468305833 2024-08-14 12:18:23,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2649820.0, ans=0.0 2024-08-14 12:18:36,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2649920.0, ans=0.1 2024-08-14 12:18:37,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2649920.0, ans=0.0 2024-08-14 12:18:53,348 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4150, loss[loss=0.08675, beats_loss=0.009957, ecapa_loss=0.0001575, whisper_loss=0.07522, over 18692.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001557, whisper_loss=0.09198, over 3904210.65 frames. ], batch size: 73, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:18:55,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2650020.0, ans=0.1 2024-08-14 12:18:59,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-14 12:19:00,388 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 12:19:18,043 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 12:19:30,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2024-08-14 12:19:37,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2650220.0, ans=0.2 2024-08-14 12:19:38,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2650220.0, ans=0.1 2024-08-14 12:19:39,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2650220.0, ans=0.125 2024-08-14 12:19:46,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2650320.0, ans=0.2 2024-08-14 12:19:47,834 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 12:19:49,785 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 12:19:54,972 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 12:20:12,188 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 12:20:16,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4200, loss[loss=0.09807, beats_loss=0.009836, ecapa_loss=0.0001946, whisper_loss=0.08629, over 20075.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001566, whisper_loss=0.09174, over 3872392.41 frames. ], batch size: 86, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:20:27,452 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.390e+01 2.581e+01 2.872e+01 4.290e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-14 12:20:32,841 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 12:20:40,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=12.0 2024-08-14 12:20:42,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2650620.0, ans=0.125 2024-08-14 12:20:42,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-08-14 12:20:48,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2650720.0, ans=0.125 2024-08-14 12:20:51,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2650720.0, ans=0.0 2024-08-14 12:20:51,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2650720.0, ans=0.1 2024-08-14 12:20:56,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2650720.0, ans=0.125 2024-08-14 12:21:01,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2650720.0, ans=0.2 2024-08-14 12:21:11,084 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.421e-02 2024-08-14 12:21:15,225 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 12:21:31,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2024-08-14 12:21:36,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4250, loss[loss=0.1106, beats_loss=0.01008, ecapa_loss=0.0001617, whisper_loss=0.09895, over 20514.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001572, whisper_loss=0.09147, over 3858523.22 frames. ], batch size: 83, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:21:57,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2651120.0, ans=0.0 2024-08-14 12:22:02,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2651120.0, ans=0.125 2024-08-14 12:22:16,972 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 12:22:18,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2651220.0, ans=0.1 2024-08-14 12:22:24,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=12.0 2024-08-14 12:22:38,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2651320.0, ans=0.125 2024-08-14 12:22:40,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2651320.0, ans=0.0 2024-08-14 12:22:54,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2651420.0, ans=0.0 2024-08-14 12:22:57,756 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4300, loss[loss=0.08697, beats_loss=0.01273, ecapa_loss=0.0001417, whisper_loss=0.07283, over 17794.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001564, whisper_loss=0.09123, over 3879820.46 frames. ], batch size: 73, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:23:01,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2651520.0, ans=0.125 2024-08-14 12:23:08,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.461e+01 2.630e+01 3.002e+01 3.746e+02, threshold=5.260e+01, percent-clipped=1.0 2024-08-14 12:23:17,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2651620.0, ans=0.0 2024-08-14 12:23:51,581 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 12:23:53,315 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 12:23:59,008 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 12:24:11,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2651920.0, ans=0.125 2024-08-14 12:24:15,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4350, loss[loss=0.09062, beats_loss=0.01307, ecapa_loss=0.0001717, whisper_loss=0.07583, over 20483.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001567, whisper_loss=0.09094, over 3844016.53 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:24:34,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2652120.0, ans=0.125 2024-08-14 12:24:37,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2652120.0, ans=0.1 2024-08-14 12:24:40,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2652120.0, ans=0.0 2024-08-14 12:24:42,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2652120.0, ans=0.2 2024-08-14 12:24:46,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2652220.0, ans=0.125 2024-08-14 12:24:46,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2652220.0, ans=0.0 2024-08-14 12:24:57,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2024-08-14 12:25:06,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2652320.0, ans=15.0 2024-08-14 12:25:11,639 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 12:25:14,779 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 12:25:30,645 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4400, loss[loss=0.08028, beats_loss=0.009525, ecapa_loss=0.0001362, whisper_loss=0.0694, over 15729.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001561, whisper_loss=0.09108, over 3845036.10 frames. ], batch size: 59, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:25:32,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2652520.0, ans=0.1 2024-08-14 12:25:39,544 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 12:25:40,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.397e+01 2.574e+01 2.948e+01 5.281e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-14 12:25:43,966 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 12:25:45,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2652620.0, ans=0.1 2024-08-14 12:25:47,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2652620.0, ans=0.0 2024-08-14 12:25:48,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2652620.0, ans=0.0 2024-08-14 12:25:49,748 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 12:25:51,107 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 12:26:07,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-14 12:26:10,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2652720.0, ans=0.125 2024-08-14 12:26:32,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=2652920.0, ans=15.0 2024-08-14 12:26:38,202 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 12:26:43,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2653020.0, ans=0.1 2024-08-14 12:26:43,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4450, loss[loss=0.09618, beats_loss=0.01102, ecapa_loss=0.0001799, whisper_loss=0.08336, over 21813.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001556, whisper_loss=0.0908, over 3848028.93 frames. ], batch size: 91, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:26:46,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2024-08-14 12:27:06,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2653120.0, ans=0.1 2024-08-14 12:27:18,419 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-14 12:27:29,745 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 12:27:43,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2653420.0, ans=0.125 2024-08-14 12:27:55,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2653420.0, ans=0.125 2024-08-14 12:27:57,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-08-14 12:27:57,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4500, loss[loss=0.09872, beats_loss=0.01115, ecapa_loss=0.0001531, whisper_loss=0.08604, over 20258.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001553, whisper_loss=0.09027, over 3869349.04 frames. ], batch size: 81, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:28:03,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2653520.0, ans=0.0 2024-08-14 12:28:05,530 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 12:28:07,068 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-14 12:28:08,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+01 2.296e+01 2.547e+01 2.865e+01 4.084e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-14 12:28:15,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2653620.0, ans=0.125 2024-08-14 12:28:16,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2653620.0, ans=0.09899494936611666 2024-08-14 12:28:22,472 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 12:28:24,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2653620.0, ans=0.0 2024-08-14 12:29:13,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4550, loss[loss=0.1039, beats_loss=0.01294, ecapa_loss=0.0001079, whisper_loss=0.08988, over 23280.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001562, whisper_loss=0.09023, over 3884886.22 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:29:19,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2654020.0, ans=0.1 2024-08-14 12:29:23,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2654020.0, ans=0.125 2024-08-14 12:29:28,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2024-08-14 12:29:43,125 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 12:29:51,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-14 12:29:54,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2024-08-14 12:29:55,035 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 34 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 12:30:14,071 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 12 from Vox, 49 fro AS 2024-08-14 12:30:15,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2654420.0, ans=0.0 2024-08-14 12:30:19,952 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 12:30:29,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4600, loss[loss=0.09476, beats_loss=0.01232, ecapa_loss=0.000123, whisper_loss=0.08121, over 21220.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.000156, whisper_loss=0.09015, over 3887169.79 frames. ], batch size: 84, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:30:34,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2654520.0, ans=0.125 2024-08-14 12:30:39,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2654520.0, ans=0.1 2024-08-14 12:30:39,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.344e+01 2.580e+01 2.840e+01 1.542e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-14 12:30:40,335 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 12:30:51,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2654620.0, ans=0.5 2024-08-14 12:30:52,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2654620.0, ans=0.125 2024-08-14 12:30:54,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2654620.0, ans=0.0 2024-08-14 12:31:03,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2654720.0, ans=0.125 2024-08-14 12:31:13,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=12.0 2024-08-14 12:31:14,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-08-14 12:31:19,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2654820.0, ans=0.125 2024-08-14 12:31:19,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2654820.0, ans=0.09899494936611666 2024-08-14 12:31:23,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2654820.0, ans=0.0 2024-08-14 12:31:27,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2024-08-14 12:31:41,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2024-08-14 12:31:42,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2654920.0, ans=0.0 2024-08-14 12:31:46,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4650, loss[loss=0.1257, beats_loss=0.01016, ecapa_loss=0.0001549, whisper_loss=0.114, over 21259.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001564, whisper_loss=0.09084, over 3885621.08 frames. ], batch size: 83, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:31:54,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2655020.0, ans=0.0 2024-08-14 12:32:03,476 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 12:32:24,356 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 12:32:25,755 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 12:33:13,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4700, loss[loss=0.1166, beats_loss=0.009919, ecapa_loss=0.0001961, whisper_loss=0.1048, over 19036.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001575, whisper_loss=0.09094, over 3859557.26 frames. ], batch size: 81, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:33:25,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.499e+01 2.871e+01 5.538e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-14 12:33:39,478 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 12:33:39,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2655620.0, ans=0.125 2024-08-14 12:34:14,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2655820.0, ans=0.0 2024-08-14 12:34:29,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2655920.0, ans=0.125 2024-08-14 12:34:36,068 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2024-08-14 12:34:37,980 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4750, loss[loss=0.0806, beats_loss=0.01273, ecapa_loss=0.0001142, whisper_loss=0.06673, over 17987.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.000157, whisper_loss=0.09039, over 3835484.38 frames. ], batch size: 70, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:34:54,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2656120.0, ans=0.125 2024-08-14 12:35:02,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2656120.0, ans=0.0 2024-08-14 12:35:10,190 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 12:35:28,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-08-14 12:35:38,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-14 12:35:47,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2656420.0, ans=0.125 2024-08-14 12:35:50,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2656520.0, ans=0.0 2024-08-14 12:35:51,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4800, loss[loss=0.104, beats_loss=0.0103, ecapa_loss=0.0001657, whisper_loss=0.09201, over 14296.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001572, whisper_loss=0.09063, over 3835730.55 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:36:02,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.391e+01 2.624e+01 2.971e+01 4.050e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-14 12:36:19,899 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 12:36:21,278 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 12:36:34,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2656820.0, ans=0.0 2024-08-14 12:36:44,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-14 12:36:53,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-08-14 12:36:55,345 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 12:36:59,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=22.5 2024-08-14 12:37:05,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4850, loss[loss=0.1041, beats_loss=0.009454, ecapa_loss=0.0001624, whisper_loss=0.09299, over 22544.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001558, whisper_loss=0.09102, over 3857459.83 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:37:06,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2657020.0, ans=0.1 2024-08-14 12:37:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2657020.0, ans=0.125 2024-08-14 12:37:13,516 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 12:37:32,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2657120.0, ans=0.125 2024-08-14 12:37:40,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2024-08-14 12:37:51,839 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 12:37:53,114 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 12:37:57,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=15.0 2024-08-14 12:38:18,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2657420.0, ans=0.0 2024-08-14 12:38:20,944 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4900, loss[loss=0.1052, beats_loss=0.01197, ecapa_loss=0.0001449, whisper_loss=0.09183, over 23262.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001573, whisper_loss=0.09142, over 3822971.06 frames. ], batch size: 94, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:38:25,623 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-14 12:38:31,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.386e+01 2.578e+01 2.812e+01 7.156e+01, threshold=5.157e+01, percent-clipped=2.0 2024-08-14 12:38:45,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2657620.0, ans=0.125 2024-08-14 12:38:49,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2657620.0, ans=0.125 2024-08-14 12:39:08,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2657820.0, ans=0.125 2024-08-14 12:39:20,564 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 12:39:20,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2657920.0, ans=0.125 2024-08-14 12:39:36,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-08-14 12:39:36,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 4950, loss[loss=0.09185, beats_loss=0.01058, ecapa_loss=0.0001918, whisper_loss=0.07935, over 21297.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001574, whisper_loss=0.09045, over 3856152.69 frames. ], batch size: 91, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:39:50,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-14 12:39:57,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2658120.0, ans=0.125 2024-08-14 12:40:16,770 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 12:40:39,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2658420.0, ans=0.125 2024-08-14 12:40:46,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2658420.0, ans=0.0 2024-08-14 12:40:50,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5000, loss[loss=0.09422, beats_loss=0.01088, ecapa_loss=0.000128, whisper_loss=0.08205, over 17900.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001574, whisper_loss=0.09049, over 3871423.60 frames. ], batch size: 67, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:40:50,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2658520.0, ans=0.05 2024-08-14 12:41:01,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.269e+01 2.546e+01 2.965e+01 4.784e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 12:41:03,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2658520.0, ans=0.125 2024-08-14 12:41:05,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2658620.0, ans=0.0 2024-08-14 12:41:13,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2658620.0, ans=0.015 2024-08-14 12:41:22,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2658720.0, ans=0.125 2024-08-14 12:41:44,584 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 12:41:45,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=12.0 2024-08-14 12:41:52,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2658920.0, ans=0.125 2024-08-14 12:42:00,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2658920.0, ans=0.125 2024-08-14 12:42:05,614 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5050, loss[loss=0.1078, beats_loss=0.01139, ecapa_loss=0.0001221, whisper_loss=0.0952, over 14673.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.0001568, whisper_loss=0.0906, over 3880682.02 frames. ], batch size: 57, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:42:24,108 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 12:42:30,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2659120.0, ans=0.125 2024-08-14 12:42:32,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2659120.0, ans=0.0 2024-08-14 12:42:48,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2659220.0, ans=0.125 2024-08-14 12:43:08,261 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 12:43:11,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2659420.0, ans=0.125 2024-08-14 12:43:21,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5100, loss[loss=0.09798, beats_loss=0.01131, ecapa_loss=0.0001437, whisper_loss=0.08524, over 22064.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01088, ecapa_loss=0.0001568, whisper_loss=0.09039, over 3885340.90 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:43:24,075 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.640e-03 2024-08-14 12:43:32,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.354e+01 2.635e+01 2.968e+01 4.253e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-14 12:43:40,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2659620.0, ans=0.07 2024-08-14 12:44:11,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2659820.0, ans=0.125 2024-08-14 12:44:36,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5150, loss[loss=0.08902, beats_loss=0.01063, ecapa_loss=0.0001538, whisper_loss=0.07685, over 16613.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001563, whisper_loss=0.09072, over 3901175.38 frames. ], batch size: 67, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:44:36,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2660020.0, ans=0.0 2024-08-14 12:44:36,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2660020.0, ans=0.125 2024-08-14 12:44:37,757 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 12:45:08,173 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-14 12:45:08,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2660220.0, ans=0.125 2024-08-14 12:45:14,179 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 12:45:20,777 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:45:27,952 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 12:45:37,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2660420.0, ans=0.0 2024-08-14 12:45:46,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2660420.0, ans=0.125 2024-08-14 12:45:51,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5200, loss[loss=0.1329, beats_loss=0.008524, ecapa_loss=0.0001701, whisper_loss=0.1227, over 23100.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.000157, whisper_loss=0.09084, over 3887715.68 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:45:53,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2660520.0, ans=0.125 2024-08-14 12:46:00,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2660520.0, ans=0.025 2024-08-14 12:46:02,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.363e+01 2.791e+01 3.410e+01 2.422e+02, threshold=5.583e+01, percent-clipped=4.0 2024-08-14 12:46:24,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2660720.0, ans=0.125 2024-08-14 12:46:37,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2660820.0, ans=0.125 2024-08-14 12:46:43,209 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 12:46:45,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2660820.0, ans=0.04949747468305833 2024-08-14 12:46:46,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2660820.0, ans=0.125 2024-08-14 12:46:56,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2660920.0, ans=0.0 2024-08-14 12:46:57,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2660920.0, ans=0.0 2024-08-14 12:47:04,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2660920.0, ans=0.0 2024-08-14 12:47:06,568 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5250, loss[loss=0.09755, beats_loss=0.01076, ecapa_loss=0.0001261, whisper_loss=0.08553, over 15169.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001561, whisper_loss=0.09084, over 3869585.31 frames. ], batch size: 59, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:47:17,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2661020.0, ans=0.125 2024-08-14 12:47:28,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2024-08-14 12:47:37,045 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 12:47:50,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2661320.0, ans=0.0 2024-08-14 12:48:19,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2661520.0, ans=0.125 2024-08-14 12:48:20,124 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5300, loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001667, whisper_loss=0.09115, over 14249.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001557, whisper_loss=0.09113, over 3829395.17 frames. ], batch size: 57, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:48:23,338 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 12:48:24,489 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 16 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 12:48:24,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2661520.0, ans=0.125 2024-08-14 12:48:29,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.292e+01 2.528e+01 2.841e+01 9.142e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-14 12:48:35,964 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 12:48:37,796 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 12:48:45,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2661620.0, ans=0.125 2024-08-14 12:48:51,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2661720.0, ans=0.125 2024-08-14 12:48:52,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2661720.0, ans=0.0 2024-08-14 12:48:54,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2661720.0, ans=0.07 2024-08-14 12:48:54,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2661720.0, ans=0.05 2024-08-14 12:49:04,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2661820.0, ans=0.2 2024-08-14 12:49:22,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2661920.0, ans=0.125 2024-08-14 12:49:33,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5350, loss[loss=0.1006, beats_loss=0.008599, ecapa_loss=0.0001991, whisper_loss=0.09, over 16341.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001562, whisper_loss=0.09137, over 3821642.93 frames. ], batch size: 69, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:49:34,122 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 12:49:45,730 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07321541011333466, model_norm_threshold=50.560279846191406 2024-08-14 12:49:45,929 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.720e+04, grad_sumsq=6.720e+04, orig_rms_sq=1.000e+00 2024-08-14 12:49:49,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2662120.0, ans=0.015 2024-08-14 12:49:51,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-14 12:50:14,688 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 32 from Vox, 24 fro AS 2024-08-14 12:50:24,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-14 12:50:25,397 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 15 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 12:50:26,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2024-08-14 12:50:26,866 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 12:50:28,443 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 12:50:37,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2662420.0, ans=0.0 2024-08-14 12:50:46,126 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-14 12:50:48,793 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5400, loss[loss=0.1263, beats_loss=0.00888, ecapa_loss=0.0001465, whisper_loss=0.116, over 23704.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001563, whisper_loss=0.09116, over 3845906.58 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:50:58,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.310e+01 2.504e+01 2.679e+01 6.906e+02, threshold=5.009e+01, percent-clipped=1.0 2024-08-14 12:51:03,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2662620.0, ans=0.5 2024-08-14 12:51:35,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2662820.0, ans=0.125 2024-08-14 12:51:56,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2662920.0, ans=0.05 2024-08-14 12:52:00,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-08-14 12:52:00,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5450, loss[loss=0.1126, beats_loss=0.008917, ecapa_loss=0.0001622, whisper_loss=0.1021, over 21421.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001567, whisper_loss=0.0906, over 3855678.85 frames. ], batch size: 84, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:52:08,245 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 12:52:23,300 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.258e+05 2024-08-14 12:52:26,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2663120.0, ans=0.2 2024-08-14 12:52:36,555 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 12:52:44,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2663320.0, ans=0.125 2024-08-14 12:53:09,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2663420.0, ans=0.0 2024-08-14 12:53:11,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2663420.0, ans=0.2 2024-08-14 12:53:14,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5500, loss[loss=0.1165, beats_loss=0.007959, ecapa_loss=0.0001407, whisper_loss=0.1071, over 17001.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001568, whisper_loss=0.09044, over 3864252.87 frames. ], batch size: 62, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:53:24,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.510e+01 2.706e+01 3.048e+01 6.260e+01, threshold=5.412e+01, percent-clipped=1.0 2024-08-14 12:53:28,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2663620.0, ans=0.2 2024-08-14 12:53:31,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-14 12:53:38,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2663620.0, ans=0.2 2024-08-14 12:53:51,846 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 51 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 12:53:55,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.91 vs. limit=10.0 2024-08-14 12:54:02,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2663820.0, ans=0.0 2024-08-14 12:54:25,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2663920.0, ans=0.1 2024-08-14 12:54:28,844 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5550, loss[loss=0.1093, beats_loss=0.01036, ecapa_loss=0.0001689, whisper_loss=0.09729, over 22859.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001569, whisper_loss=0.0909, over 3881810.91 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:54:33,551 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 12:54:42,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2664120.0, ans=0.0 2024-08-14 12:54:45,668 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 12:54:46,823 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 12:54:48,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2664120.0, ans=0.0 2024-08-14 12:55:15,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-14 12:55:37,798 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 12:55:43,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5600, loss[loss=0.1127, beats_loss=0.01071, ecapa_loss=0.0001502, whisper_loss=0.1005, over 23321.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.000156, whisper_loss=0.0917, over 3869129.08 frames. ], batch size: 93, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:55:54,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.335e+01 2.676e+01 3.034e+01 3.132e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-14 12:55:57,732 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 12:55:59,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2664620.0, ans=0.125 2024-08-14 12:56:02,235 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 12:56:32,784 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 5 from Vox, 37 fro AS 2024-08-14 12:56:48,818 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 12:56:57,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5650, loss[loss=0.08419, beats_loss=0.01261, ecapa_loss=0.0001399, whisper_loss=0.07019, over 20766.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001546, whisper_loss=0.0904, over 3879306.85 frames. ], batch size: 85, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:57:18,168 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 12:57:24,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2665120.0, ans=0.0 2024-08-14 12:57:42,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2024-08-14 12:57:58,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=22.5 2024-08-14 12:58:10,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5700, loss[loss=0.1139, beats_loss=0.009467, ecapa_loss=0.0001836, whisper_loss=0.1026, over 17874.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001568, whisper_loss=0.09149, over 3891020.04 frames. ], batch size: 71, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:58:20,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.440e+01 2.658e+01 3.007e+01 5.166e+01, threshold=5.317e+01, percent-clipped=0.0 2024-08-14 12:58:38,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2665620.0, ans=0.0 2024-08-14 12:59:10,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665920.0, ans=0.1 2024-08-14 12:59:11,501 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 12:59:18,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2665920.0, ans=0.125 2024-08-14 12:59:25,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5750, loss[loss=0.078, beats_loss=0.01296, ecapa_loss=0.0001674, whisper_loss=0.06336, over 14200.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001568, whisper_loss=0.09131, over 3920387.34 frames. ], batch size: 62, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:59:40,988 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 12:59:42,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2666120.0, ans=0.125 2024-08-14 12:59:47,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2666120.0, ans=0.125 2024-08-14 12:59:48,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2666120.0, ans=0.125 2024-08-14 13:00:03,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2666220.0, ans=0.2 2024-08-14 13:00:21,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-14 13:00:26,681 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 13:00:40,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5800, loss[loss=0.09859, beats_loss=0.009579, ecapa_loss=0.0002001, whisper_loss=0.08701, over 13476.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001573, whisper_loss=0.0909, over 3904425.58 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:00:50,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.671e+01 3.010e+01 5.088e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-14 13:00:52,153 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-14 13:00:59,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2666620.0, ans=0.1 2024-08-14 13:01:06,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-14 13:01:13,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2666720.0, ans=0.0 2024-08-14 13:01:33,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2666820.0, ans=0.1 2024-08-14 13:01:34,519 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 13:01:49,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2666920.0, ans=0.125 2024-08-14 13:01:54,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5850, loss[loss=0.08682, beats_loss=0.01407, ecapa_loss=0.0001121, whisper_loss=0.07163, over 13473.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.000156, whisper_loss=0.09042, over 3845666.70 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:02:08,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.66 vs. limit=22.5 2024-08-14 13:02:16,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2667120.0, ans=0.125 2024-08-14 13:02:20,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2667120.0, ans=0.0 2024-08-14 13:02:27,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-08-14 13:02:36,215 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 13:02:38,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2667320.0, ans=0.0 2024-08-14 13:02:38,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2024-08-14 13:02:40,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2667320.0, ans=0.1 2024-08-14 13:02:48,024 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 13:03:08,967 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5900, loss[loss=0.111, beats_loss=0.009804, ecapa_loss=0.0001678, whisper_loss=0.09953, over 23730.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001551, whisper_loss=0.09111, over 3879543.97 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:03:18,976 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.335e+01 2.608e+01 2.996e+01 4.185e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 13:03:22,708 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 10 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 13:03:25,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2667620.0, ans=0.125 2024-08-14 13:03:53,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2667820.0, ans=0.125 2024-08-14 13:03:55,181 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 13:03:59,361 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 13:03:59,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2667820.0, ans=0.125 2024-08-14 13:04:03,743 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 13:04:11,223 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 13:04:20,068 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 13:04:22,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 5950, loss[loss=0.1312, beats_loss=0.007021, ecapa_loss=0.0001668, whisper_loss=0.1225, over 14656.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001561, whisper_loss=0.09077, over 3849329.32 frames. ], batch size: 55, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:04:44,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2668120.0, ans=0.05 2024-08-14 13:05:21,319 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 13:05:23,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2668420.0, ans=0.125 2024-08-14 13:05:30,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2668420.0, ans=0.125 2024-08-14 13:05:33,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2668420.0, ans=0.04949747468305833 2024-08-14 13:05:36,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.04 vs. limit=15.0 2024-08-14 13:05:37,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6000, loss[loss=0.1076, beats_loss=0.008813, ecapa_loss=0.0001758, whisper_loss=0.097, over 16318.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001565, whisper_loss=0.09115, over 3866681.60 frames. ], batch size: 62, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:05:37,220 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 13:06:13,746 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000548, whisper_loss=0.2476, over 922467.00 frames. 2024-08-14 13:06:29,988 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on SV_voxceleb1: loss=0.004318, beats_loss=0, ecapa_loss=0.0004318, whisper_loss=0, over 939242.00 frames. 2024-08-14 13:08:18,988 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on AT_audioset: loss=0.02353, beats_loss=0.02353, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 13:08:18,993 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 13:08:29,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.205e+01 2.512e+01 2.812e+01 4.887e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-14 13:08:31,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2668520.0, ans=0.125 2024-08-14 13:08:33,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.23 vs. limit=10.0 2024-08-14 13:08:48,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-14 13:08:58,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2024-08-14 13:09:09,705 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 13:09:10,768 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0662800669670105, model_norm_threshold=50.23476028442383 2024-08-14 13:09:10,990 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.141e+07, orig_rms_sq=9.876e-03 2024-08-14 13:09:11,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2668820.0, ans=0.125 2024-08-14 13:09:13,129 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 13:09:20,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2668920.0, ans=0.0 2024-08-14 13:09:31,087 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 13:09:31,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2668920.0, ans=0.0 2024-08-14 13:09:34,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6050, loss[loss=0.1091, beats_loss=0.009856, ecapa_loss=0.00015, whisper_loss=0.09772, over 21936.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01056, ecapa_loss=0.0001562, whisper_loss=0.09228, over 3877598.38 frames. ], batch size: 87, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:09:36,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2669020.0, ans=0.1 2024-08-14 13:09:39,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2669020.0, ans=0.125 2024-08-14 13:09:39,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2669020.0, ans=0.125 2024-08-14 13:09:58,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2669120.0, ans=0.0 2024-08-14 13:10:18,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2669320.0, ans=0.1 2024-08-14 13:10:25,361 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-14 13:10:33,223 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.350e+00 2024-08-14 13:10:48,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6100, loss[loss=0.1023, beats_loss=0.01156, ecapa_loss=0.0001476, whisper_loss=0.08929, over 21240.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001567, whisper_loss=0.09162, over 3880633.46 frames. ], batch size: 88, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:10:59,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.399e+01 2.783e+01 3.218e+01 7.579e+02, threshold=5.567e+01, percent-clipped=5.0 2024-08-14 13:11:12,698 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 13:11:25,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2669720.0, ans=0.0 2024-08-14 13:12:04,178 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6150, loss[loss=0.1197, beats_loss=0.009704, ecapa_loss=0.0001495, whisper_loss=0.1085, over 22936.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001556, whisper_loss=0.09072, over 3882364.19 frames. ], batch size: 89, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:12:26,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2670120.0, ans=0.09899494936611666 2024-08-14 13:12:26,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2670120.0, ans=0.2 2024-08-14 13:12:30,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2670120.0, ans=0.0 2024-08-14 13:12:55,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2670320.0, ans=0.125 2024-08-14 13:12:56,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2670320.0, ans=0.125 2024-08-14 13:13:18,074 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6200, loss[loss=0.09672, beats_loss=0.01098, ecapa_loss=0.0001952, whisper_loss=0.08379, over 21267.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001553, whisper_loss=0.0902, over 3864011.56 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:13:25,869 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 13:13:26,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2670520.0, ans=0.035 2024-08-14 13:13:28,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.359e+01 2.589e+01 2.919e+01 1.541e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 13:13:32,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2670620.0, ans=0.125 2024-08-14 13:13:49,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2670720.0, ans=0.2 2024-08-14 13:14:02,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-14 13:14:05,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2024-08-14 13:14:09,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-08-14 13:14:29,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-14 13:14:32,079 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6250, loss[loss=0.1051, beats_loss=0.01173, ecapa_loss=0.0001406, whisper_loss=0.09199, over 17389.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001553, whisper_loss=0.09086, over 3880588.57 frames. ], batch size: 68, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:14:37,787 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 13:14:51,368 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 13:15:13,951 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 13:15:17,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-08-14 13:15:24,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2671320.0, ans=0.0 2024-08-14 13:15:33,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2671420.0, ans=0.0 2024-08-14 13:15:45,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6300, loss[loss=0.09298, beats_loss=0.01188, ecapa_loss=0.0001305, whisper_loss=0.0798, over 23194.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001555, whisper_loss=0.09026, over 3881757.10 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:15:49,836 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 13:15:54,732 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 13:15:57,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.285e+01 2.511e+01 2.818e+01 8.993e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-14 13:16:31,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-14 13:16:37,605 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 13:16:57,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2671920.0, ans=0.125 2024-08-14 13:16:58,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2024-08-14 13:17:00,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6350, loss[loss=0.09385, beats_loss=0.01114, ecapa_loss=0.0001702, whisper_loss=0.081, over 18797.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001564, whisper_loss=0.09057, over 3859533.24 frames. ], batch size: 77, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:17:01,972 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 13:17:09,824 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 13:17:31,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-14 13:17:34,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2672220.0, ans=0.0 2024-08-14 13:17:50,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2672320.0, ans=10.0 2024-08-14 13:17:51,496 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 13:17:57,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2672320.0, ans=0.125 2024-08-14 13:17:57,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=12.0 2024-08-14 13:18:14,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6400, loss[loss=0.09823, beats_loss=0.01103, ecapa_loss=0.0001709, whisper_loss=0.08549, over 17715.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001558, whisper_loss=0.09071, over 3870424.74 frames. ], batch size: 75, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:18:24,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2672520.0, ans=0.2 2024-08-14 13:18:25,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.344e+01 2.584e+01 2.860e+01 4.850e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 13:18:36,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2672620.0, ans=0.125 2024-08-14 13:19:03,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-08-14 13:19:10,271 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 13:19:12,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2672920.0, ans=0.1 2024-08-14 13:19:28,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6450, loss[loss=0.1236, beats_loss=0.01002, ecapa_loss=0.0001635, whisper_loss=0.1119, over 23819.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001553, whisper_loss=0.09093, over 3892511.27 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:20:01,031 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 13:20:34,853 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-14 13:20:41,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6500, loss[loss=0.1027, beats_loss=0.01178, ecapa_loss=0.0001493, whisper_loss=0.08944, over 22119.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01064, ecapa_loss=0.0001545, whisper_loss=0.09241, over 3921527.23 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:20:53,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.410e+01 2.661e+01 2.982e+01 1.028e+02, threshold=5.322e+01, percent-clipped=1.0 2024-08-14 13:21:25,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2673820.0, ans=0.125 2024-08-14 13:21:29,528 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 13:21:44,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2673920.0, ans=0.125 2024-08-14 13:21:48,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2673920.0, ans=0.125 2024-08-14 13:21:55,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6550, loss[loss=0.1102, beats_loss=0.01054, ecapa_loss=0.0001725, whisper_loss=0.09791, over 20164.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01061, ecapa_loss=0.000156, whisper_loss=0.09241, over 3894698.77 frames. ], batch size: 81, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:21:56,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2674020.0, ans=0.1 2024-08-14 13:22:06,389 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 13:22:14,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2674120.0, ans=0.0 2024-08-14 13:22:18,272 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 13:22:22,752 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 13:22:27,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2674220.0, ans=0.1 2024-08-14 13:22:41,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2674320.0, ans=0.125 2024-08-14 13:22:47,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2674320.0, ans=0.1 2024-08-14 13:22:51,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2674320.0, ans=0.125 2024-08-14 13:23:08,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6600, loss[loss=0.08656, beats_loss=0.01071, ecapa_loss=0.0001483, whisper_loss=0.07436, over 13574.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01061, ecapa_loss=0.0001573, whisper_loss=0.0919, over 3920007.89 frames. ], batch size: 54, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:23:12,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-14 13:23:18,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2674520.0, ans=0.125 2024-08-14 13:23:20,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.726e+01 3.181e+01 5.119e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-14 13:23:21,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2674520.0, ans=0.1 2024-08-14 13:23:24,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2674620.0, ans=0.1 2024-08-14 13:23:36,984 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 13:23:39,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-08-14 13:24:04,145 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 13:24:21,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6650, loss[loss=0.1009, beats_loss=0.01026, ecapa_loss=0.0001328, whisper_loss=0.08927, over 15439.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01057, ecapa_loss=0.0001567, whisper_loss=0.09213, over 3925267.12 frames. ], batch size: 56, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:24:25,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2675020.0, ans=0.125 2024-08-14 13:24:45,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2675120.0, ans=0.125 2024-08-14 13:24:47,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2675120.0, ans=22.5 2024-08-14 13:25:05,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2675320.0, ans=0.0 2024-08-14 13:25:15,501 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 13:25:21,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2675420.0, ans=0.125 2024-08-14 13:25:26,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2675420.0, ans=0.07 2024-08-14 13:25:28,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2675420.0, ans=0.125 2024-08-14 13:25:35,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6700, loss[loss=0.08627, beats_loss=0.00994, ecapa_loss=0.000145, whisper_loss=0.07488, over 14866.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001574, whisper_loss=0.09109, over 3888839.90 frames. ], batch size: 57, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:25:46,319 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 13:25:47,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.392e+01 2.630e+01 2.889e+01 1.018e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-14 13:26:06,683 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 13:26:11,197 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 13:26:11,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2675720.0, ans=0.0 2024-08-14 13:26:46,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2675920.0, ans=0.2 2024-08-14 13:26:49,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6750, loss[loss=0.1158, beats_loss=0.01075, ecapa_loss=0.0001697, whisper_loss=0.1034, over 18671.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.000158, whisper_loss=0.0905, over 3873955.22 frames. ], batch size: 75, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:26:56,385 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 13:27:15,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2676120.0, ans=0.125 2024-08-14 13:27:28,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2676220.0, ans=0.0 2024-08-14 13:27:40,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2024-08-14 13:27:43,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2676320.0, ans=0.0 2024-08-14 13:27:44,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2676320.0, ans=0.2 2024-08-14 13:27:54,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-14 13:27:56,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-14 13:28:02,255 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6800, loss[loss=0.1127, beats_loss=0.01003, ecapa_loss=0.0001815, whisper_loss=0.1009, over 20857.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001588, whisper_loss=0.09052, over 3886563.53 frames. ], batch size: 84, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:28:13,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2676520.0, ans=0.09899494936611666 2024-08-14 13:28:14,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.378e+01 2.676e+01 3.043e+01 8.013e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-14 13:28:17,800 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 13:28:33,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2676720.0, ans=0.125 2024-08-14 13:28:43,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2676720.0, ans=0.0 2024-08-14 13:29:04,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2676920.0, ans=0.125 2024-08-14 13:29:10,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2676920.0, ans=0.125 2024-08-14 13:29:14,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2676920.0, ans=0.025 2024-08-14 13:29:17,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6850, loss[loss=0.08757, beats_loss=0.01356, ecapa_loss=0.0001433, whisper_loss=0.07257, over 16149.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001583, whisper_loss=0.09013, over 3868620.47 frames. ], batch size: 65, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:29:29,954 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 13:29:39,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677120.0, ans=0.1 2024-08-14 13:29:44,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2024-08-14 13:29:46,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2677220.0, ans=0.125 2024-08-14 13:29:51,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=2677220.0, ans=15.0 2024-08-14 13:30:27,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2677520.0, ans=0.2 2024-08-14 13:30:28,609 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6900, loss[loss=0.1206, beats_loss=0.01009, ecapa_loss=0.0001124, whisper_loss=0.1094, over 21884.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001569, whisper_loss=0.08963, over 3844855.52 frames. ], batch size: 81, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:30:34,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2677520.0, ans=0.125 2024-08-14 13:30:39,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.298e+01 2.502e+01 2.840e+01 6.631e+01, threshold=5.005e+01, percent-clipped=1.0 2024-08-14 13:30:41,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2677620.0, ans=0.125 2024-08-14 13:30:57,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2677720.0, ans=0.0 2024-08-14 13:30:57,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2677720.0, ans=0.07 2024-08-14 13:31:11,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2677820.0, ans=0.125 2024-08-14 13:31:16,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2677820.0, ans=0.2 2024-08-14 13:31:21,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=2677820.0, ans=0.2 2024-08-14 13:31:39,237 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 6950, loss[loss=0.1051, beats_loss=0.009971, ecapa_loss=0.0001392, whisper_loss=0.09371, over 20964.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001566, whisper_loss=0.09071, over 3871667.18 frames. ], batch size: 81, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:31:46,461 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 13:31:48,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2678020.0, ans=0.04949747468305833 2024-08-14 13:31:48,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2678020.0, ans=0.0 2024-08-14 13:31:57,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-14 13:31:58,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-08-14 13:32:24,456 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 13:32:32,899 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 13:32:50,670 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7000, loss[loss=0.1074, beats_loss=0.01078, ecapa_loss=0.0001256, whisper_loss=0.09537, over 18986.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001562, whisper_loss=0.09088, over 3837714.33 frames. ], batch size: 73, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:32:59,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2678520.0, ans=0.125 2024-08-14 13:33:01,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.255e+01 2.474e+01 2.854e+01 4.338e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-14 13:33:08,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2678620.0, ans=0.125 2024-08-14 13:33:15,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2678620.0, ans=0.125 2024-08-14 13:33:24,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2678720.0, ans=0.125 2024-08-14 13:33:34,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2024-08-14 13:33:35,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2678820.0, ans=0.125 2024-08-14 13:33:45,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2678820.0, ans=0.0 2024-08-14 13:33:48,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2678920.0, ans=0.125 2024-08-14 13:33:57,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2678920.0, ans=0.125 2024-08-14 13:34:01,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7050, loss[loss=0.09627, beats_loss=0.01065, ecapa_loss=0.0001406, whisper_loss=0.08422, over 22582.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001567, whisper_loss=0.09019, over 3840912.60 frames. ], batch size: 89, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:34:01,960 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 13:34:05,887 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 13:34:25,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-14 13:34:41,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2679220.0, ans=0.5 2024-08-14 13:34:48,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2679320.0, ans=0.2 2024-08-14 13:34:51,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2679320.0, ans=0.125 2024-08-14 13:34:55,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2679320.0, ans=0.125 2024-08-14 13:35:01,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2679420.0, ans=0.0 2024-08-14 13:35:02,322 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 13:35:02,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2679420.0, ans=0.125 2024-08-14 13:35:02,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-14 13:35:06,190 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 13:35:11,096 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.240e+05 2024-08-14 13:35:13,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7100, loss[loss=0.08862, beats_loss=0.009972, ecapa_loss=0.000158, whisper_loss=0.07706, over 13980.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001562, whisper_loss=0.09045, over 3847796.90 frames. ], batch size: 54, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:35:24,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.38 vs. limit=22.5 2024-08-14 13:35:24,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.302e+01 2.502e+01 2.737e+01 3.925e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-14 13:35:26,423 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 13:35:28,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2679620.0, ans=0.0 2024-08-14 13:36:00,308 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 13:36:03,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2679820.0, ans=0.95 2024-08-14 13:36:04,341 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 13:36:22,597 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 13:36:27,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2679920.0, ans=0.125 2024-08-14 13:36:30,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7150, loss[loss=0.09819, beats_loss=0.00974, ecapa_loss=0.0001573, whisper_loss=0.08687, over 17502.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001552, whisper_loss=0.09082, over 3889555.40 frames. ], batch size: 71, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:36:34,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-14 13:36:46,205 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 13:36:48,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2680120.0, ans=0.0 2024-08-14 13:37:05,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2680120.0, ans=0.1 2024-08-14 13:37:22,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2680320.0, ans=0.05 2024-08-14 13:37:27,973 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 13:37:32,609 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 13:37:35,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2680320.0, ans=0.05 2024-08-14 13:37:37,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2680420.0, ans=0.2 2024-08-14 13:37:38,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2680420.0, ans=0.1 2024-08-14 13:37:52,767 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7200, loss[loss=0.09864, beats_loss=0.01248, ecapa_loss=0.0001524, whisper_loss=0.08463, over 20920.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001549, whisper_loss=0.09085, over 3898406.51 frames. ], batch size: 87, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:37:59,014 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 41 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 13:38:04,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.341e+01 2.648e+01 2.948e+01 9.250e+01, threshold=5.295e+01, percent-clipped=2.0 2024-08-14 13:38:41,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2680820.0, ans=0.125 2024-08-14 13:38:48,839 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 13:38:49,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2680820.0, ans=0.125 2024-08-14 13:39:03,619 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-14 13:39:05,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2024-08-14 13:39:07,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7250, loss[loss=0.08819, beats_loss=0.01194, ecapa_loss=0.0001518, whisper_loss=0.07474, over 17608.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001558, whisper_loss=0.09064, over 3891656.82 frames. ], batch size: 71, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:39:20,353 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 13:39:31,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2681120.0, ans=0.125 2024-08-14 13:39:35,241 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 13:39:47,168 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 13:40:21,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7300, loss[loss=0.1197, beats_loss=0.009515, ecapa_loss=0.000156, whisper_loss=0.1087, over 20657.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001567, whisper_loss=0.09055, over 3874146.11 frames. ], batch size: 79, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:40:33,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.289e+01 2.573e+01 2.951e+01 1.378e+02, threshold=5.146e+01, percent-clipped=1.0 2024-08-14 13:40:53,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2681720.0, ans=0.0 2024-08-14 13:41:01,254 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 13:41:16,593 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 13:41:36,487 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7350, loss[loss=0.1083, beats_loss=0.0115, ecapa_loss=0.0001563, whisper_loss=0.0952, over 20981.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001567, whisper_loss=0.09132, over 3899825.48 frames. ], batch size: 82, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:41:38,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2682020.0, ans=0.1 2024-08-14 13:41:49,678 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 13:41:54,268 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 13:42:02,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2682120.0, ans=0.0 2024-08-14 13:42:12,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2682220.0, ans=0.125 2024-08-14 13:42:21,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2682320.0, ans=0.0 2024-08-14 13:42:25,412 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 15 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 13:42:26,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2682320.0, ans=15.0 2024-08-14 13:42:29,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2682320.0, ans=0.0 2024-08-14 13:42:37,413 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 13:42:50,693 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7400, loss[loss=0.1095, beats_loss=0.008269, ecapa_loss=0.0001487, whisper_loss=0.09975, over 19004.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001558, whisper_loss=0.09126, over 3891059.06 frames. ], batch size: 68, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:43:02,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.321e+01 2.551e+01 2.887e+01 1.021e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 13:43:02,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2682520.0, ans=0.125 2024-08-14 13:43:05,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2682620.0, ans=0.2 2024-08-14 13:43:07,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-14 13:43:11,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2682620.0, ans=0.125 2024-08-14 13:43:18,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2682720.0, ans=0.125 2024-08-14 13:43:25,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2682720.0, ans=0.0 2024-08-14 13:43:28,321 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 13:43:29,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2682720.0, ans=0.125 2024-08-14 13:43:47,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2682920.0, ans=0.1 2024-08-14 13:43:56,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2682920.0, ans=0.125 2024-08-14 13:44:00,185 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.157e+05 2024-08-14 13:44:02,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7450, loss[loss=0.1248, beats_loss=0.009742, ecapa_loss=0.0001274, whisper_loss=0.1138, over 20496.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001561, whisper_loss=0.09133, over 3863847.32 frames. ], batch size: 79, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:44:17,001 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 13:44:18,741 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 13:44:20,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2683120.0, ans=0.2 2024-08-14 13:44:22,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2683120.0, ans=0.125 2024-08-14 13:44:23,226 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 13:44:35,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-14 13:44:38,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-08-14 13:44:38,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=8.0 2024-08-14 13:44:52,517 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 13:45:16,414 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7500, loss[loss=0.09323, beats_loss=0.01299, ecapa_loss=0.0001422, whisper_loss=0.07882, over 21843.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001565, whisper_loss=0.09011, over 3836908.68 frames. ], batch size: 88, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:45:28,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.296e+01 2.546e+01 2.865e+01 4.082e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 13:45:28,469 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 13:45:30,103 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 13:45:45,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2683720.0, ans=0.125 2024-08-14 13:46:02,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2683820.0, ans=0.125 2024-08-14 13:46:05,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2683820.0, ans=0.0 2024-08-14 13:46:16,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2683920.0, ans=0.0 2024-08-14 13:46:18,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2683920.0, ans=0.0 2024-08-14 13:46:32,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7550, loss[loss=0.09644, beats_loss=0.009273, ecapa_loss=0.000175, whisper_loss=0.08542, over 17969.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001564, whisper_loss=0.09013, over 3820814.24 frames. ], batch size: 70, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:46:40,103 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-14 13:46:52,462 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 13:46:52,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2684120.0, ans=0.125 2024-08-14 13:47:46,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7600, loss[loss=0.1056, beats_loss=0.0104, ecapa_loss=0.0001739, whisper_loss=0.09341, over 22027.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01071, ecapa_loss=0.0001567, whisper_loss=0.08985, over 3801878.51 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:47:49,849 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 13:47:55,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2024-08-14 13:47:58,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.371e+01 2.546e+01 2.782e+01 5.094e+01, threshold=5.091e+01, percent-clipped=1.0 2024-08-14 13:48:01,045 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.176e+01 2024-08-14 13:48:01,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2684620.0, ans=0.1 2024-08-14 13:48:03,440 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 13:48:03,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2684620.0, ans=0.1 2024-08-14 13:48:12,258 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 13:48:24,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2684720.0, ans=0.1 2024-08-14 13:48:29,753 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 13:48:35,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2024-08-14 13:48:55,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.93 vs. limit=10.0 2024-08-14 13:48:58,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2684920.0, ans=0.025 2024-08-14 13:49:00,493 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7650, loss[loss=0.1056, beats_loss=0.008462, ecapa_loss=0.0002179, whisper_loss=0.09501, over 15420.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001576, whisper_loss=0.08988, over 3848733.61 frames. ], batch size: 65, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:49:12,754 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 13:49:17,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-14 13:49:36,497 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 13:49:46,192 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 13:49:49,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2685320.0, ans=0.125 2024-08-14 13:49:58,017 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-14 13:50:13,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7700, loss[loss=0.1219, beats_loss=0.01001, ecapa_loss=0.0001695, whisper_loss=0.1102, over 22418.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001556, whisper_loss=0.09023, over 3860358.12 frames. ], batch size: 88, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:50:15,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-08-14 13:50:19,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-14 13:50:25,913 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.559e+01 2.371e+01 2.640e+01 3.039e+01 4.657e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-14 13:50:27,735 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 13:50:29,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2685620.0, ans=0.125 2024-08-14 13:51:10,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-14 13:51:13,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2685920.0, ans=0.125 2024-08-14 13:51:13,954 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 13:51:26,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7750, loss[loss=0.114, beats_loss=0.007971, ecapa_loss=0.0001645, whisper_loss=0.1044, over 22371.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.000155, whisper_loss=0.09007, over 3880836.39 frames. ], batch size: 88, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:51:33,662 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:51:44,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-08-14 13:51:48,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2686120.0, ans=0.0 2024-08-14 13:51:54,054 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 13:52:16,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=12.0 2024-08-14 13:52:37,974 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 13:52:40,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7800, loss[loss=0.09464, beats_loss=0.00968, ecapa_loss=0.0001595, whisper_loss=0.08337, over 13624.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001549, whisper_loss=0.09089, over 3896462.71 frames. ], batch size: 54, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:52:41,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2686520.0, ans=0.125 2024-08-14 13:52:44,368 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:52:52,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.425e+01 2.611e+01 2.883e+01 9.855e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-14 13:53:03,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2686620.0, ans=0.125 2024-08-14 13:53:06,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-14 13:53:16,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2686720.0, ans=0.1 2024-08-14 13:53:26,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2686820.0, ans=0.125 2024-08-14 13:53:48,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2686920.0, ans=0.025 2024-08-14 13:53:50,106 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 13:53:51,582 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 13:53:54,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7850, loss[loss=0.06938, beats_loss=0.0137, ecapa_loss=0.0001693, whisper_loss=0.05399, over 17442.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001543, whisper_loss=0.09125, over 3903371.68 frames. ], batch size: 75, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:54:00,539 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.425e+05 2024-08-14 13:54:21,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-14 13:55:08,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7900, loss[loss=0.1143, beats_loss=0.01024, ecapa_loss=0.0001441, whisper_loss=0.1026, over 18541.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001534, whisper_loss=0.09141, over 3901388.96 frames. ], batch size: 71, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:55:11,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2024-08-14 13:55:20,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.378e+01 2.612e+01 2.895e+01 1.059e+02, threshold=5.225e+01, percent-clipped=1.0 2024-08-14 13:55:33,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-08-14 13:55:36,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-14 13:55:44,597 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 13:56:19,004 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 13:56:22,934 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 7950, loss[loss=0.0947, beats_loss=0.011, ecapa_loss=0.0001318, whisper_loss=0.08239, over 19239.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001544, whisper_loss=0.09146, over 3878683.44 frames. ], batch size: 75, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:56:25,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2688020.0, ans=0.2 2024-08-14 13:56:26,166 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 13:56:29,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-08-14 13:57:21,408 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 13:57:21,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2688420.0, ans=0.1 2024-08-14 13:57:26,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=8.0 2024-08-14 13:57:28,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-14 13:57:30,616 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-14 13:57:37,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8000, loss[loss=0.1044, beats_loss=0.01062, ecapa_loss=0.0001441, whisper_loss=0.09232, over 19414.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001536, whisper_loss=0.0917, over 3900470.72 frames. ], batch size: 77, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:57:39,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2688520.0, ans=0.0 2024-08-14 13:57:48,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.382e+01 2.629e+01 3.053e+01 3.860e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 13:57:55,085 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 13:57:59,810 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 13:58:06,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2688720.0, ans=0.2 2024-08-14 13:58:10,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2688720.0, ans=0.0 2024-08-14 13:58:14,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2688720.0, ans=0.025 2024-08-14 13:58:16,509 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.281e+00 2024-08-14 13:58:24,884 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 13:58:32,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=12.0 2024-08-14 13:58:47,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-14 13:58:49,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2689020.0, ans=0.125 2024-08-14 13:58:50,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8050, loss[loss=0.1039, beats_loss=0.009893, ecapa_loss=0.0001505, whisper_loss=0.09253, over 18832.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001551, whisper_loss=0.09156, over 3894246.00 frames. ], batch size: 74, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:58:51,175 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 13:58:53,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2024-08-14 13:59:05,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2689120.0, ans=0.125 2024-08-14 13:59:05,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2689120.0, ans=0.0 2024-08-14 13:59:08,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2024-08-14 13:59:08,885 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 13:59:20,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2689220.0, ans=0.125 2024-08-14 13:59:48,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-08-14 14:00:00,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2689420.0, ans=0.1 2024-08-14 14:00:03,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8100, loss[loss=0.1074, beats_loss=0.01253, ecapa_loss=0.0001043, whisper_loss=0.09381, over 24549.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001558, whisper_loss=0.09091, over 3898153.04 frames. ], batch size: 93, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:00:15,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.342e+01 2.614e+01 2.950e+01 9.116e+01, threshold=5.228e+01, percent-clipped=3.0 2024-08-14 14:00:21,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2024-08-14 14:00:26,072 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 14:00:26,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2689620.0, ans=0.07 2024-08-14 14:00:41,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2689720.0, ans=0.0 2024-08-14 14:00:42,194 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 14:00:51,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2024-08-14 14:00:55,338 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 14:01:02,412 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 14:01:03,726 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 14:01:15,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8150, loss[loss=0.1014, beats_loss=0.008905, ecapa_loss=0.0001513, whisper_loss=0.091, over 22201.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001552, whisper_loss=0.09068, over 3923018.00 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:02:01,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2690320.0, ans=0.1 2024-08-14 14:02:09,043 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 14:02:18,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2690420.0, ans=0.2 2024-08-14 14:02:21,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2690420.0, ans=0.07 2024-08-14 14:02:22,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2024-08-14 14:02:28,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2690520.0, ans=0.0 2024-08-14 14:02:29,242 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8200, loss[loss=0.1079, beats_loss=0.01067, ecapa_loss=0.0001532, whisper_loss=0.09567, over 23078.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09051, over 3917283.67 frames. ], batch size: 92, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:02:32,174 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 14:02:40,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.302e+01 2.493e+01 2.763e+01 4.005e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-14 14:02:49,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2690620.0, ans=0.125 2024-08-14 14:02:51,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2690620.0, ans=0.04949747468305833 2024-08-14 14:02:52,718 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 14:03:00,135 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 13 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 14:03:04,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2690720.0, ans=0.125 2024-08-14 14:03:14,504 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 14:03:17,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2690820.0, ans=0.1 2024-08-14 14:03:27,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2690920.0, ans=0.0 2024-08-14 14:03:27,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2690920.0, ans=0.125 2024-08-14 14:03:42,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8250, loss[loss=0.1175, beats_loss=0.01083, ecapa_loss=0.0001838, whisper_loss=0.1048, over 21996.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001555, whisper_loss=0.08994, over 3880922.05 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:03:47,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2691020.0, ans=0.5 2024-08-14 14:04:00,505 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 14:04:22,962 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 14:04:23,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2691220.0, ans=0.1 2024-08-14 14:04:27,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:38,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-14 14:04:40,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2691420.0, ans=0.2 2024-08-14 14:04:41,248 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.01 vs. limit=10.0 2024-08-14 14:04:56,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8300, loss[loss=0.08642, beats_loss=0.01183, ecapa_loss=0.0001754, whisper_loss=0.07283, over 16504.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.09061, over 3886732.20 frames. ], batch size: 71, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:05:00,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2691520.0, ans=0.125 2024-08-14 14:05:08,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.406e+01 2.618e+01 2.998e+01 6.409e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-14 14:05:08,481 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 14:05:29,246 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 14:05:31,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2691720.0, ans=0.125 2024-08-14 14:05:42,366 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 14:05:51,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2691820.0, ans=0.125 2024-08-14 14:06:01,937 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 14:06:03,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2691920.0, ans=0.125 2024-08-14 14:06:03,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2691920.0, ans=0.05 2024-08-14 14:06:09,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2692020.0, ans=0.5 2024-08-14 14:06:10,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8350, loss[loss=0.1062, beats_loss=0.01014, ecapa_loss=0.0001828, whisper_loss=0.09422, over 22101.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.09054, over 3908295.55 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:06:28,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2692120.0, ans=10.0 2024-08-14 14:06:29,414 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 14:06:37,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2692120.0, ans=0.0 2024-08-14 14:06:39,600 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 14:06:59,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2692320.0, ans=0.125 2024-08-14 14:07:06,715 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 14:07:23,847 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 14:07:27,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2692420.0, ans=0.1 2024-08-14 14:07:30,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8400, loss[loss=0.09199, beats_loss=0.01215, ecapa_loss=0.0001428, whisper_loss=0.07841, over 15721.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001564, whisper_loss=0.09076, over 3922089.99 frames. ], batch size: 62, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:07:41,827 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 14:07:43,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.388e+01 2.632e+01 2.972e+01 1.432e+02, threshold=5.263e+01, percent-clipped=3.0 2024-08-14 14:07:47,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2692620.0, ans=0.0 2024-08-14 14:07:54,614 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 14:07:58,966 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 35 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 14:08:09,745 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-14 14:08:26,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2692820.0, ans=0.1 2024-08-14 14:08:30,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2692820.0, ans=0.125 2024-08-14 14:08:38,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-08-14 14:08:46,295 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 14:08:48,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8450, loss[loss=0.1238, beats_loss=0.008237, ecapa_loss=0.0001943, whisper_loss=0.1136, over 23015.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.000158, whisper_loss=0.09148, over 3902084.52 frames. ], batch size: 93, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:09:04,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2693120.0, ans=0.0 2024-08-14 14:09:12,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2693120.0, ans=0.125 2024-08-14 14:09:19,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2693220.0, ans=0.125 2024-08-14 14:09:22,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2693220.0, ans=0.125 2024-08-14 14:09:24,773 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 25 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 14:09:25,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2693220.0, ans=0.125 2024-08-14 14:09:28,023 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 14:09:28,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2693220.0, ans=0.04949747468305833 2024-08-14 14:09:28,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-14 14:09:30,763 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 33 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-14 14:09:35,883 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 14:09:50,218 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-14 14:09:51,953 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 14:09:55,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2693420.0, ans=0.125 2024-08-14 14:09:57,566 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 14:10:03,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2693420.0, ans=0.125 2024-08-14 14:10:04,952 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 30 from Vox, 23 fro AS 2024-08-14 14:10:06,666 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8500, loss[loss=0.1311, beats_loss=0.006615, ecapa_loss=0.0001898, whisper_loss=0.1226, over 21655.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001568, whisper_loss=0.09078, over 3872779.59 frames. ], batch size: 87, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:10:14,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-14 14:10:19,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.292e+01 2.601e+01 3.025e+01 1.070e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 14:10:19,779 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 14:10:25,677 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 14:10:27,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2693620.0, ans=0.125 2024-08-14 14:10:58,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.70 vs. limit=10.0 2024-08-14 14:11:08,805 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-14 14:11:21,360 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 14:11:21,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-14 14:11:23,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2693920.0, ans=0.125 2024-08-14 14:11:24,261 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 14:11:27,241 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8550, loss[loss=0.1029, beats_loss=0.009588, ecapa_loss=0.0001296, whisper_loss=0.092, over 20892.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001567, whisper_loss=0.09195, over 3885820.57 frames. ], batch size: 78, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:11:36,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2694020.0, ans=0.1 2024-08-14 14:11:36,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2694020.0, ans=0.2 2024-08-14 14:11:46,517 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 14:11:59,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2694220.0, ans=0.125 2024-08-14 14:12:04,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2694220.0, ans=0.5 2024-08-14 14:12:22,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-14 14:12:29,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2694420.0, ans=10.0 2024-08-14 14:12:29,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2694420.0, ans=0.125 2024-08-14 14:12:41,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2694420.0, ans=0.0 2024-08-14 14:12:45,526 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8600, loss[loss=0.07982, beats_loss=0.01249, ecapa_loss=0.0001315, whisper_loss=0.06602, over 21319.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01061, ecapa_loss=0.0001546, whisper_loss=0.09186, over 3889078.13 frames. ], batch size: 84, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:12:57,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.473e+01 2.757e+01 3.150e+01 4.170e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-14 14:13:21,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-14 14:13:23,800 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 14:13:46,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2694920.0, ans=0.125 2024-08-14 14:13:50,921 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 14:13:55,589 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 14:13:57,559 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:14:03,556 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8650, loss[loss=0.09971, beats_loss=0.009043, ecapa_loss=0.0001864, whisper_loss=0.0888, over 13938.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001546, whisper_loss=0.09057, over 3881449.19 frames. ], batch size: 58, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:14:10,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2695020.0, ans=0.125 2024-08-14 14:14:15,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-14 14:14:49,630 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 14:14:55,474 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 14:15:18,742 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8700, loss[loss=0.07112, beats_loss=0.01392, ecapa_loss=0.0001125, whisper_loss=0.05607, over 17903.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001552, whisper_loss=0.09077, over 3845160.03 frames. ], batch size: 72, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:15:30,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.361e+01 2.667e+01 2.943e+01 6.389e+01, threshold=5.334e+01, percent-clipped=1.0 2024-08-14 14:15:55,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2695720.0, ans=0.1 2024-08-14 14:15:57,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2695720.0, ans=0.0 2024-08-14 14:16:00,030 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 14:16:10,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2695820.0, ans=0.1 2024-08-14 14:16:12,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2695820.0, ans=0.1 2024-08-14 14:16:31,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8750, loss[loss=0.08845, beats_loss=0.008612, ecapa_loss=0.0001406, whisper_loss=0.07843, over 16486.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.000156, whisper_loss=0.09027, over 3839923.79 frames. ], batch size: 60, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:16:38,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2696020.0, ans=0.2 2024-08-14 14:16:41,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2696020.0, ans=0.1 2024-08-14 14:16:42,551 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 14:16:49,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.69 vs. limit=22.5 2024-08-14 14:16:53,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2024-08-14 14:17:20,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2696320.0, ans=0.2 2024-08-14 14:17:22,824 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 14:17:29,763 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 14:17:34,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2696420.0, ans=0.125 2024-08-14 14:17:39,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2696420.0, ans=0.125 2024-08-14 14:17:39,987 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 14:17:41,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2696420.0, ans=0.125 2024-08-14 14:17:44,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8800, loss[loss=0.1034, beats_loss=0.008514, ecapa_loss=0.0001768, whisper_loss=0.09308, over 13740.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001558, whisper_loss=0.09037, over 3814765.90 frames. ], batch size: 55, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:17:45,855 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 14:17:52,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=15.0 2024-08-14 14:17:55,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.470e+01 2.757e+01 3.014e+01 7.462e+01, threshold=5.513e+01, percent-clipped=1.0 2024-08-14 14:17:56,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2696520.0, ans=0.125 2024-08-14 14:17:58,809 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 14:18:39,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.10 vs. limit=10.0 2024-08-14 14:18:52,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2696920.0, ans=0.2 2024-08-14 14:18:58,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8850, loss[loss=0.1123, beats_loss=0.006985, ecapa_loss=0.0001563, whisper_loss=0.1037, over 19103.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001549, whisper_loss=0.09058, over 3843148.72 frames. ], batch size: 71, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:19:00,457 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 14:19:04,712 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 14:19:07,611 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 39 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 14:19:16,090 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 14:19:19,027 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 14:19:22,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2697120.0, ans=0.1 2024-08-14 14:19:25,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2697120.0, ans=0.09899494936611666 2024-08-14 14:19:30,699 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 14:19:38,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2697220.0, ans=0.125 2024-08-14 14:20:11,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8900, loss[loss=0.1006, beats_loss=0.008773, ecapa_loss=0.000142, whisper_loss=0.09043, over 17407.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001554, whisper_loss=0.09035, over 3824985.13 frames. ], batch size: 67, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:20:15,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2697520.0, ans=0.125 2024-08-14 14:20:23,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.627e+01 2.296e+01 2.497e+01 2.712e+01 4.460e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 14:20:36,965 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 14:20:59,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2697820.0, ans=0.1 2024-08-14 14:21:05,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2697820.0, ans=0.125 2024-08-14 14:21:12,494 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 14:21:25,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 8950, loss[loss=0.1172, beats_loss=0.009512, ecapa_loss=0.0001235, whisper_loss=0.1064, over 17785.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001548, whisper_loss=0.09089, over 3829842.02 frames. ], batch size: 65, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:21:26,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2698020.0, ans=0.025 2024-08-14 14:21:26,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=12.0 2024-08-14 14:21:27,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2698020.0, ans=0.1 2024-08-14 14:21:29,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2698020.0, ans=0.125 2024-08-14 14:21:41,139 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 14:21:49,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2698120.0, ans=0.0 2024-08-14 14:22:05,599 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 14:22:05,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2698220.0, ans=0.0 2024-08-14 14:22:13,091 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 14:22:14,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2698320.0, ans=0.1 2024-08-14 14:22:19,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2698320.0, ans=0.0 2024-08-14 14:22:39,101 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9000, loss[loss=0.1173, beats_loss=0.009804, ecapa_loss=0.0001444, whisper_loss=0.106, over 22823.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.000154, whisper_loss=0.09149, over 3870048.73 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:22:39,102 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 14:23:17,700 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005393, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 14:23:35,716 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on SV_voxceleb1: loss=0.00426, beats_loss=0, ecapa_loss=0.000426, whisper_loss=0, over 939242.00 frames. 2024-08-14 14:25:24,110 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 14:25:24,114 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 14:25:26,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2698520.0, ans=0.0 2024-08-14 14:25:27,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-14 14:25:35,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.561e+01 2.926e+01 5.640e+01, threshold=5.122e+01, percent-clipped=1.0 2024-08-14 14:25:45,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2698620.0, ans=0.125 2024-08-14 14:25:54,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=12.0 2024-08-14 14:26:03,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2698720.0, ans=0.0 2024-08-14 14:26:15,426 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 14:26:36,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-08-14 14:26:38,393 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9050, loss[loss=0.1052, beats_loss=0.009573, ecapa_loss=0.0001797, whisper_loss=0.09379, over 18808.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001543, whisper_loss=0.09202, over 3873078.77 frames. ], batch size: 79, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:26:39,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2699020.0, ans=0.2 2024-08-14 14:26:39,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2699020.0, ans=0.0 2024-08-14 14:26:54,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-14 14:27:19,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2699220.0, ans=0.2 2024-08-14 14:27:24,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=22.5 2024-08-14 14:27:44,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2699420.0, ans=0.2 2024-08-14 14:27:45,646 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 14:27:52,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9100, loss[loss=0.09012, beats_loss=0.01096, ecapa_loss=0.0001786, whisper_loss=0.07738, over 21539.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001551, whisper_loss=0.09135, over 3874833.00 frames. ], batch size: 90, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:27:56,054 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 14:27:59,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2699520.0, ans=0.125 2024-08-14 14:28:04,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.245e+01 2.534e+01 2.882e+01 3.902e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-14 14:28:06,413 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 14:28:24,416 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 14:28:32,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2024-08-14 14:28:43,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2699820.0, ans=0.05 2024-08-14 14:28:50,564 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 14:28:56,834 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 14:29:01,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2699920.0, ans=0.125 2024-08-14 14:29:06,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9150, loss[loss=0.1022, beats_loss=0.01192, ecapa_loss=0.000157, whisper_loss=0.08868, over 22563.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001555, whisper_loss=0.09187, over 3894526.11 frames. ], batch size: 94, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:29:11,013 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 39 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 14:29:11,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2700020.0, ans=0.125 2024-08-14 14:29:15,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2700020.0, ans=0.125 2024-08-14 14:29:40,413 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 14:29:48,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2700220.0, ans=0.0 2024-08-14 14:30:07,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2700420.0, ans=0.0 2024-08-14 14:30:19,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2700520.0, ans=0.125 2024-08-14 14:30:19,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9200, loss[loss=0.09201, beats_loss=0.01283, ecapa_loss=0.0001508, whisper_loss=0.07767, over 22024.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0106, ecapa_loss=0.0001561, whisper_loss=0.0919, over 3909886.05 frames. ], batch size: 93, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:30:23,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2700520.0, ans=0.1 2024-08-14 14:30:27,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2700520.0, ans=0.125 2024-08-14 14:30:31,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.280e+01 2.601e+01 2.975e+01 5.180e+01, threshold=5.201e+01, percent-clipped=1.0 2024-08-14 14:30:31,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2700520.0, ans=0.1 2024-08-14 14:30:35,718 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 14:30:40,046 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 14:30:57,548 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 21 from LS+wenet, 36 from Vox, 35 fro AS 2024-08-14 14:31:01,830 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 14:31:03,269 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-14 14:31:13,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2700820.0, ans=0.125 2024-08-14 14:31:18,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2700920.0, ans=0.1 2024-08-14 14:31:23,553 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 14:31:29,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2700920.0, ans=0.0 2024-08-14 14:31:31,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9250, loss[loss=0.09171, beats_loss=0.01119, ecapa_loss=0.0001837, whisper_loss=0.07869, over 22360.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001568, whisper_loss=0.09163, over 3914860.24 frames. ], batch size: 93, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:31:38,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2701020.0, ans=0.0 2024-08-14 14:32:03,179 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 14:32:11,798 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 14:32:14,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2701320.0, ans=0.125 2024-08-14 14:32:18,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2701320.0, ans=0.1 2024-08-14 14:32:20,902 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 14:32:30,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2701420.0, ans=0.125 2024-08-14 14:32:43,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9300, loss[loss=0.1071, beats_loss=0.008647, ecapa_loss=0.0001714, whisper_loss=0.09673, over 23152.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001571, whisper_loss=0.09166, over 3938326.47 frames. ], batch size: 92, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:32:56,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.362e+01 2.551e+01 2.899e+01 4.764e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-14 14:32:57,770 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 14:33:10,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2701620.0, ans=0.125 2024-08-14 14:33:12,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2701720.0, ans=0.0 2024-08-14 14:33:33,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2701820.0, ans=0.0 2024-08-14 14:33:40,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2701820.0, ans=0.0 2024-08-14 14:33:46,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2701920.0, ans=0.0 2024-08-14 14:33:56,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2702020.0, ans=0.0 2024-08-14 14:33:57,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9350, loss[loss=0.08691, beats_loss=0.01152, ecapa_loss=0.0001435, whisper_loss=0.07396, over 19017.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.0001575, whisper_loss=0.09191, over 3926974.57 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:34:11,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2702120.0, ans=0.0 2024-08-14 14:34:13,068 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.281e+01 2024-08-14 14:34:14,031 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 14:34:18,370 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 14:34:27,173 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 14:34:27,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2702220.0, ans=0.125 2024-08-14 14:34:40,092 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 14:34:41,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-14 14:34:44,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2702320.0, ans=0.125 2024-08-14 14:34:51,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2702320.0, ans=0.125 2024-08-14 14:35:11,771 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9400, loss[loss=0.1182, beats_loss=0.01038, ecapa_loss=0.0001693, whisper_loss=0.1061, over 23417.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001575, whisper_loss=0.09159, over 3923169.29 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:35:15,079 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 14:35:18,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2702520.0, ans=0.125 2024-08-14 14:35:21,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-14 14:35:23,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.407e+01 2.622e+01 2.905e+01 1.999e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 14:35:31,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2702620.0, ans=0.0 2024-08-14 14:35:37,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2702620.0, ans=0.125 2024-08-14 14:35:43,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2702720.0, ans=0.0 2024-08-14 14:35:47,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2702720.0, ans=0.125 2024-08-14 14:35:51,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2702720.0, ans=0.125 2024-08-14 14:36:02,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2702820.0, ans=0.125 2024-08-14 14:36:02,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2702820.0, ans=0.125 2024-08-14 14:36:04,674 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-14 14:36:05,929 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 14:36:11,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2702920.0, ans=0.2 2024-08-14 14:36:14,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2702920.0, ans=0.125 2024-08-14 14:36:25,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9450, loss[loss=0.1198, beats_loss=0.01198, ecapa_loss=0.0001512, whisper_loss=0.1063, over 22743.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001566, whisper_loss=0.09063, over 3901497.32 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:36:27,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2703020.0, ans=0.125 2024-08-14 14:36:36,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2703020.0, ans=0.125 2024-08-14 14:36:37,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2703020.0, ans=0.2 2024-08-14 14:36:47,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2703120.0, ans=0.125 2024-08-14 14:37:04,529 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 14:37:11,999 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 14:37:14,681 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 14:37:20,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2703320.0, ans=0.0 2024-08-14 14:37:37,331 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9500, loss[loss=0.1129, beats_loss=0.01004, ecapa_loss=0.0001475, whisper_loss=0.1014, over 23596.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001562, whisper_loss=0.09099, over 3900113.04 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:37:40,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2703520.0, ans=0.1 2024-08-14 14:37:46,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2703520.0, ans=0.125 2024-08-14 14:37:47,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2703520.0, ans=0.125 2024-08-14 14:37:48,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-14 14:37:48,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.397e+01 2.649e+01 2.966e+01 9.786e+01, threshold=5.299e+01, percent-clipped=1.0 2024-08-14 14:37:49,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2703520.0, ans=0.2 2024-08-14 14:37:56,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=22.5 2024-08-14 14:37:59,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2703620.0, ans=0.0 2024-08-14 14:38:08,116 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 14:38:35,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2703920.0, ans=0.0 2024-08-14 14:38:39,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2703920.0, ans=0.125 2024-08-14 14:38:46,798 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 30 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 14:38:50,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9550, loss[loss=0.09747, beats_loss=0.01015, ecapa_loss=0.0001502, whisper_loss=0.08582, over 18303.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.09062, over 3890969.45 frames. ], batch size: 74, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:39:15,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2704120.0, ans=0.2 2024-08-14 14:39:16,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-14 14:39:27,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2704220.0, ans=0.125 2024-08-14 14:39:29,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2704220.0, ans=0.125 2024-08-14 14:39:35,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2704220.0, ans=15.0 2024-08-14 14:40:14,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2704420.0, ans=0.125 2024-08-14 14:40:17,085 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9600, loss[loss=0.1152, beats_loss=0.009477, ecapa_loss=0.0001444, whisper_loss=0.1042, over 21008.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001565, whisper_loss=0.09131, over 3919680.54 frames. ], batch size: 83, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:40:18,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2704520.0, ans=0.05 2024-08-14 14:40:30,467 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 14:40:31,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.443e+01 2.792e+01 3.086e+01 6.637e+01, threshold=5.584e+01, percent-clipped=2.0 2024-08-14 14:40:42,426 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 14:40:49,077 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 14:40:57,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2704720.0, ans=0.1 2024-08-14 14:41:02,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2704720.0, ans=0.125 2024-08-14 14:41:48,890 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9650, loss[loss=0.1223, beats_loss=0.01021, ecapa_loss=0.0001266, whisper_loss=0.1108, over 16078.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001569, whisper_loss=0.09105, over 3897892.81 frames. ], batch size: 60, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:41:49,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-14 14:42:02,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2705020.0, ans=0.0 2024-08-14 14:42:02,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2705020.0, ans=0.125 2024-08-14 14:42:02,771 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:42:02,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2705020.0, ans=0.125 2024-08-14 14:42:13,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2705120.0, ans=0.0 2024-08-14 14:42:21,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2705220.0, ans=0.0 2024-08-14 14:42:26,449 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 14:42:28,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2705220.0, ans=0.0 2024-08-14 14:42:28,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=12.0 2024-08-14 14:42:38,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2705320.0, ans=0.125 2024-08-14 14:42:47,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2705320.0, ans=0.0 2024-08-14 14:42:52,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2705420.0, ans=0.0 2024-08-14 14:42:59,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2705420.0, ans=0.0 2024-08-14 14:43:05,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9700, loss[loss=0.1019, beats_loss=0.0112, ecapa_loss=0.0001612, whisper_loss=0.08905, over 21322.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001569, whisper_loss=0.09091, over 3894923.88 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:43:17,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.210e+01 2.464e+01 2.850e+01 7.455e+01, threshold=4.928e+01, percent-clipped=1.0 2024-08-14 14:43:24,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2705620.0, ans=0.015 2024-08-14 14:43:38,003 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 14:43:46,759 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 14:43:50,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2705820.0, ans=0.125 2024-08-14 14:44:20,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9750, loss[loss=0.1135, beats_loss=0.00804, ecapa_loss=0.0001592, whisper_loss=0.1039, over 20597.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.000155, whisper_loss=0.09043, over 3886384.97 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:44:38,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2706120.0, ans=0.1 2024-08-14 14:44:41,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2706120.0, ans=0.0 2024-08-14 14:44:43,821 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 14:45:15,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2706320.0, ans=0.07 2024-08-14 14:45:19,519 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 41 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-14 14:45:19,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2706320.0, ans=0.2 2024-08-14 14:45:25,538 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 14:45:37,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9800, loss[loss=0.1113, beats_loss=0.01271, ecapa_loss=0.0001193, whisper_loss=0.09743, over 19061.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001549, whisper_loss=0.09054, over 3883012.25 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:45:49,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.322e+01 2.608e+01 2.964e+01 4.916e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 14:46:00,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2706620.0, ans=0.125 2024-08-14 14:46:14,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=15.0 2024-08-14 14:46:22,202 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 14:46:37,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2706920.0, ans=0.125 2024-08-14 14:46:39,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2706920.0, ans=0.125 2024-08-14 14:46:47,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2706920.0, ans=0.0 2024-08-14 14:46:51,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9850, loss[loss=0.07374, beats_loss=0.009753, ecapa_loss=0.000186, whisper_loss=0.06213, over 13030.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001556, whisper_loss=0.0909, over 3883933.80 frames. ], batch size: 55, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:46:57,444 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 14:47:26,965 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 14:47:33,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2707220.0, ans=0.125 2024-08-14 14:47:39,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2707320.0, ans=0.125 2024-08-14 14:48:04,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2707420.0, ans=0.0 2024-08-14 14:48:07,265 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9900, loss[loss=0.1095, beats_loss=0.01157, ecapa_loss=0.000171, whisper_loss=0.09622, over 19105.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001558, whisper_loss=0.09122, over 3879127.52 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:48:08,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2024-08-14 14:48:10,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2707520.0, ans=0.0 2024-08-14 14:48:18,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2707520.0, ans=0.125 2024-08-14 14:48:19,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.713e+01 2.970e+01 4.614e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-14 14:48:22,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2707620.0, ans=0.2 2024-08-14 14:48:25,139 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 14:48:25,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.61 vs. limit=6.0 2024-08-14 14:48:26,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2024-08-14 14:48:27,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2707620.0, ans=0.125 2024-08-14 14:48:32,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2707620.0, ans=0.125 2024-08-14 14:48:42,995 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 14:48:49,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=15.0 2024-08-14 14:48:53,910 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 14:48:54,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2707720.0, ans=0.1 2024-08-14 14:49:38,540 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 9950, loss[loss=0.1025, beats_loss=0.0114, ecapa_loss=0.0001595, whisper_loss=0.08952, over 17297.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001562, whisper_loss=0.09094, over 3857653.96 frames. ], batch size: 68, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:49:46,734 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 14:49:48,916 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 15 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 14:50:10,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2024-08-14 14:50:18,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2708120.0, ans=0.125 2024-08-14 14:50:23,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2708220.0, ans=0.0 2024-08-14 14:50:29,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2708220.0, ans=0.125 2024-08-14 14:50:32,829 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 14:50:46,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2708320.0, ans=0.125 2024-08-14 14:50:51,748 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 14:51:19,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2708420.0, ans=0.125 2024-08-14 14:51:27,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10000, loss[loss=0.1015, beats_loss=0.01104, ecapa_loss=0.0001811, whisper_loss=0.08863, over 20426.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001564, whisper_loss=0.09109, over 3854583.93 frames. ], batch size: 88, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:51:46,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.366e+01 2.562e+01 2.817e+01 3.470e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 14:51:54,853 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 14:52:43,494 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 14:52:45,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2708920.0, ans=0.125 2024-08-14 14:52:51,632 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 17 from Vox, 53 fro AS 2024-08-14 14:52:58,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10050, loss[loss=0.1178, beats_loss=0.009083, ecapa_loss=0.0001394, whisper_loss=0.1073, over 20951.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001556, whisper_loss=0.0908, over 3873218.27 frames. ], batch size: 80, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:53:00,113 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 14:53:01,973 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 14:53:03,274 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 14:53:12,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2709120.0, ans=0.0 2024-08-14 14:53:37,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2709220.0, ans=0.0 2024-08-14 14:54:04,406 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 14:54:16,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10100, loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001487, whisper_loss=0.09157, over 19647.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01079, ecapa_loss=0.000155, whisper_loss=0.09013, over 3892455.59 frames. ], batch size: 77, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:54:29,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.269e+01 2.495e+01 2.791e+01 4.696e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-14 14:54:36,633 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 14:54:36,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2709620.0, ans=0.0 2024-08-14 14:54:46,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2709720.0, ans=0.2 2024-08-14 14:54:48,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2709720.0, ans=0.0 2024-08-14 14:54:54,661 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-14 14:55:09,842 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 14:55:11,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2709820.0, ans=0.125 2024-08-14 14:55:18,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2709920.0, ans=0.125 2024-08-14 14:55:20,697 WARNING [optim.py:496] (2/4) Scaling gradients by 0.040875811129808426, model_norm_threshold=49.8900260925293 2024-08-14 14:55:20,927 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.113e+05, grad_sumsq=3.113e+05, orig_rms_sq=1.000e+00 2024-08-14 14:55:24,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2709920.0, ans=0.2 2024-08-14 14:55:32,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2709920.0, ans=0.125 2024-08-14 14:55:34,421 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10150, loss[loss=0.1028, beats_loss=0.01132, ecapa_loss=0.000158, whisper_loss=0.08989, over 22497.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001565, whisper_loss=0.09094, over 3925992.01 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:55:51,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-08-14 14:56:00,197 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:56:09,209 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 14:56:39,869 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 14:56:41,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2710420.0, ans=0.2 2024-08-14 14:56:49,353 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:56:51,703 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10200, loss[loss=0.09292, beats_loss=0.01303, ecapa_loss=0.0001152, whisper_loss=0.07874, over 22477.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01085, ecapa_loss=0.0001557, whisper_loss=0.09006, over 3940084.15 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:57:04,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.342e+01 2.619e+01 2.972e+01 1.221e+03, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 14:57:06,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2710620.0, ans=0.125 2024-08-14 14:57:18,871 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-14 14:57:19,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2710620.0, ans=0.07 2024-08-14 14:57:20,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2710720.0, ans=0.0 2024-08-14 14:57:27,376 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 14:57:41,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2710820.0, ans=0.125 2024-08-14 14:57:42,848 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:57:50,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2710820.0, ans=0.0 2024-08-14 14:57:57,842 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 14:58:08,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10250, loss[loss=0.09353, beats_loss=0.01227, ecapa_loss=0.0001409, whisper_loss=0.07985, over 18426.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001565, whisper_loss=0.08986, over 3911266.57 frames. ], batch size: 74, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:58:17,862 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 14:58:23,415 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 14:58:52,403 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 14:58:58,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2711320.0, ans=0.125 2024-08-14 14:59:00,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2711320.0, ans=0.2 2024-08-14 14:59:10,230 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-14 14:59:14,494 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 14:59:22,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2711420.0, ans=0.2 2024-08-14 14:59:29,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10300, loss[loss=0.1106, beats_loss=0.007523, ecapa_loss=0.0001928, whisper_loss=0.1011, over 18908.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001547, whisper_loss=0.08986, over 3924430.99 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 1.152921504606847e+18 2024-08-14 14:59:41,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.309e+01 2.627e+01 3.015e+01 4.712e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-14 14:59:51,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2711620.0, ans=0.125 2024-08-14 14:59:51,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-14 15:00:02,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2711720.0, ans=0.2 2024-08-14 15:00:03,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2024-08-14 15:00:17,232 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-14 15:00:21,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=2711820.0, ans=0.02 2024-08-14 15:00:21,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-14 15:00:24,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-08-14 15:00:27,063 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 15:00:51,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2711920.0, ans=0.1 2024-08-14 15:00:54,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10350, loss[loss=0.1088, beats_loss=0.01234, ecapa_loss=0.0001208, whisper_loss=0.09522, over 23892.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01084, ecapa_loss=0.0001544, whisper_loss=0.08998, over 3924564.11 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:01:12,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-08-14 15:01:14,423 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 15:01:15,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2712120.0, ans=0.125 2024-08-14 15:01:23,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-14 15:01:36,375 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:02:01,396 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-14 15:02:02,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2712420.0, ans=0.125 2024-08-14 15:02:09,836 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 15:02:12,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10400, loss[loss=0.1059, beats_loss=0.009263, ecapa_loss=0.0001729, whisper_loss=0.09494, over 16019.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0108, ecapa_loss=0.0001555, whisper_loss=0.08987, over 3925202.37 frames. ], batch size: 67, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:02:21,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2712520.0, ans=0.2 2024-08-14 15:02:25,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.638e+01 3.125e+01 4.616e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-14 15:02:28,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-08-14 15:02:41,979 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 15:02:42,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2712720.0, ans=0.09899494936611666 2024-08-14 15:02:42,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=12.0 2024-08-14 15:02:46,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2712720.0, ans=0.025 2024-08-14 15:02:58,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2024-08-14 15:03:04,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2712820.0, ans=0.125 2024-08-14 15:03:19,560 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 15:03:26,966 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10450, loss[loss=0.1111, beats_loss=0.007648, ecapa_loss=0.0001756, whisper_loss=0.1017, over 17884.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001545, whisper_loss=0.09057, over 3922786.36 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:03:33,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2713020.0, ans=0.125 2024-08-14 15:03:34,965 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 15:03:37,962 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-14 15:03:43,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2713120.0, ans=0.04949747468305833 2024-08-14 15:03:50,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2713120.0, ans=0.125 2024-08-14 15:04:11,167 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 15:04:18,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2713320.0, ans=0.2 2024-08-14 15:04:30,332 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 15:04:30,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2713420.0, ans=0.2 2024-08-14 15:04:37,959 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 25 from Vox, 16 fro AS 2024-08-14 15:04:42,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10500, loss[loss=0.08406, beats_loss=0.009778, ecapa_loss=0.0001667, whisper_loss=0.07262, over 16571.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001556, whisper_loss=0.09015, over 3882949.49 frames. ], batch size: 69, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:04:55,511 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.389e+01 2.560e+01 2.877e+01 3.688e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 15:04:56,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2713620.0, ans=0.125 2024-08-14 15:05:21,307 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 15:05:38,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2713820.0, ans=0.125 2024-08-14 15:05:41,018 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 15:05:52,601 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 15:05:56,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10550, loss[loss=0.1146, beats_loss=0.009681, ecapa_loss=0.0001536, whisper_loss=0.1034, over 16469.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001562, whisper_loss=0.0901, over 3876904.87 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:06:06,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-14 15:06:12,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2714120.0, ans=0.1 2024-08-14 15:06:35,330 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 15:06:37,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2714220.0, ans=0.2 2024-08-14 15:06:41,342 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-14 15:06:47,705 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:07:01,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2714420.0, ans=0.1 2024-08-14 15:07:02,400 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 15:07:05,397 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-14 15:07:10,918 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10600, loss[loss=0.1381, beats_loss=0.007891, ecapa_loss=0.0001791, whisper_loss=0.1284, over 23638.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.000157, whisper_loss=0.09025, over 3903380.01 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:07:24,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.333e+01 2.524e+01 2.900e+01 4.921e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-14 15:07:58,968 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 15:08:00,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2714820.0, ans=0.2 2024-08-14 15:08:06,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2714820.0, ans=0.125 2024-08-14 15:08:21,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2714920.0, ans=0.125 2024-08-14 15:08:23,995 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 15:08:25,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10650, loss[loss=0.109, beats_loss=0.009431, ecapa_loss=0.0001649, whisper_loss=0.09794, over 22484.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001555, whisper_loss=0.09031, over 3878581.40 frames. ], batch size: 85, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:08:27,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2715020.0, ans=0.125 2024-08-14 15:08:29,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2715020.0, ans=0.0 2024-08-14 15:08:36,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-14 15:08:39,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2715120.0, ans=0.1 2024-08-14 15:09:09,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2715320.0, ans=0.0 2024-08-14 15:09:11,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2715320.0, ans=0.07 2024-08-14 15:09:15,188 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 15:09:26,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-14 15:09:39,653 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10700, loss[loss=0.0843, beats_loss=0.01078, ecapa_loss=0.0001777, whisper_loss=0.07174, over 15427.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001548, whisper_loss=0.09035, over 3884467.51 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:09:53,061 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.367e+01 2.619e+01 3.037e+01 4.020e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-14 15:10:00,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2024-08-14 15:10:00,839 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 15:10:04,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2715620.0, ans=0.125 2024-08-14 15:10:04,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2715620.0, ans=0.125 2024-08-14 15:10:06,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2715620.0, ans=0.2 2024-08-14 15:10:09,816 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.207e+01 2024-08-14 15:10:17,044 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 15:10:29,041 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 15:10:35,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2715820.0, ans=0.125 2024-08-14 15:10:41,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2715920.0, ans=0.0 2024-08-14 15:10:44,978 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-08-14 15:10:54,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10750, loss[loss=0.08034, beats_loss=0.01569, ecapa_loss=0.0001009, whisper_loss=0.06364, over 19790.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.000155, whisper_loss=0.09112, over 3891049.71 frames. ], batch size: 81, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:10:54,811 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 15:11:03,797 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-14 15:11:05,444 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 15:11:22,040 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 15:11:25,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2716220.0, ans=0.1 2024-08-14 15:11:25,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2716220.0, ans=0.125 2024-08-14 15:11:40,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2716320.0, ans=0.125 2024-08-14 15:11:50,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2716320.0, ans=0.1 2024-08-14 15:12:05,848 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 15:12:07,208 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 15:12:09,939 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10800, loss[loss=0.09013, beats_loss=0.009696, ecapa_loss=0.0002041, whisper_loss=0.07839, over 18886.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001548, whisper_loss=0.09168, over 3915330.73 frames. ], batch size: 80, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:12:23,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.404e+01 2.650e+01 3.101e+01 5.207e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-14 15:12:36,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2716620.0, ans=0.0 2024-08-14 15:12:40,019 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 15:12:48,049 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 15:12:53,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2716820.0, ans=0.1 2024-08-14 15:12:56,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2716820.0, ans=0.2 2024-08-14 15:13:03,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2716820.0, ans=0.125 2024-08-14 15:13:08,734 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 15:13:11,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2716920.0, ans=0.125 2024-08-14 15:13:23,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10850, loss[loss=0.06536, beats_loss=0.01185, ecapa_loss=0.000155, whisper_loss=0.05197, over 14676.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001563, whisper_loss=0.09192, over 3918028.02 frames. ], batch size: 61, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:13:34,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-08-14 15:14:01,505 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 15:14:03,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2717220.0, ans=0.125 2024-08-14 15:14:10,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2717320.0, ans=0.0 2024-08-14 15:14:14,670 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-14 15:14:18,753 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 15:14:28,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2717420.0, ans=0.125 2024-08-14 15:14:37,954 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 15:14:39,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10900, loss[loss=0.08756, beats_loss=0.01339, ecapa_loss=0.0001657, whisper_loss=0.07251, over 15645.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001566, whisper_loss=0.09177, over 3936609.16 frames. ], batch size: 64, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:14:39,488 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 15:14:52,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.325e+01 2.589e+01 2.879e+01 4.786e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-14 15:14:58,871 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 15:15:08,995 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 15:15:13,658 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 15:15:22,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2717820.0, ans=0.1 2024-08-14 15:15:48,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-14 15:15:53,885 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 10950, loss[loss=0.09776, beats_loss=0.01096, ecapa_loss=0.0001682, whisper_loss=0.08511, over 21152.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.0001553, whisper_loss=0.09178, over 3910631.62 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:15:54,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-14 15:16:07,083 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 15:16:08,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2718120.0, ans=0.0 2024-08-14 15:16:11,352 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 15:16:14,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2718120.0, ans=0.125 2024-08-14 15:16:20,668 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 15:16:38,345 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 15:16:38,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2718320.0, ans=0.2 2024-08-14 15:16:45,921 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 15:16:54,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2718420.0, ans=0.125 2024-08-14 15:17:10,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11000, loss[loss=0.08869, beats_loss=0.008683, ecapa_loss=0.0001813, whisper_loss=0.07819, over 16588.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001563, whisper_loss=0.09177, over 3899396.57 frames. ], batch size: 70, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:17:10,305 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 15:17:17,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2718520.0, ans=0.125 2024-08-14 15:17:25,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.292e+01 2.575e+01 2.886e+01 4.359e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 15:17:28,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2718620.0, ans=0.0 2024-08-14 15:17:31,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2718620.0, ans=0.125 2024-08-14 15:17:48,047 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 15:17:54,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2718720.0, ans=0.1 2024-08-14 15:17:55,734 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 17 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-14 15:18:03,950 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 15:18:34,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11050, loss[loss=0.1193, beats_loss=0.01004, ecapa_loss=0.0001618, whisper_loss=0.1076, over 21320.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001548, whisper_loss=0.0912, over 3908293.98 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:18:44,503 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 15:19:09,830 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 15:19:13,536 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-14 15:19:53,038 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 15:19:57,642 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 15:20:00,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11100, loss[loss=0.122, beats_loss=0.007562, ecapa_loss=0.0001705, whisper_loss=0.1127, over 15427.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001549, whisper_loss=0.09087, over 3886032.70 frames. ], batch size: 59, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:20:01,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=15.0 2024-08-14 15:20:05,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2024-08-14 15:20:14,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.445e+01 2.651e+01 2.947e+01 5.465e+01, threshold=5.303e+01, percent-clipped=1.0 2024-08-14 15:20:33,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2719720.0, ans=0.125 2024-08-14 15:20:36,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2719720.0, ans=0.0 2024-08-14 15:20:36,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-14 15:20:45,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2719820.0, ans=0.1 2024-08-14 15:20:48,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-14 15:20:54,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2719820.0, ans=0.0 2024-08-14 15:21:05,823 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 15:21:19,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11150, loss[loss=0.09932, beats_loss=0.01054, ecapa_loss=0.0001489, whisper_loss=0.08729, over 22221.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001546, whisper_loss=0.09052, over 3920174.59 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:21:26,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2720020.0, ans=0.0 2024-08-14 15:21:31,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2720020.0, ans=0.0 2024-08-14 15:21:54,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2720220.0, ans=0.125 2024-08-14 15:21:54,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-14 15:21:58,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-14 15:22:06,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2024-08-14 15:22:26,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2720420.0, ans=0.0 2024-08-14 15:22:33,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11200, loss[loss=0.106, beats_loss=0.008826, ecapa_loss=0.0001434, whisper_loss=0.09573, over 13916.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.0001556, whisper_loss=0.09147, over 3903277.47 frames. ], batch size: 54, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:22:46,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.435e+01 2.587e+01 2.892e+01 4.591e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 15:22:46,851 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-14 15:22:50,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2720620.0, ans=0.125 2024-08-14 15:22:58,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2720620.0, ans=0.125 2024-08-14 15:23:03,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2720720.0, ans=0.0 2024-08-14 15:23:06,329 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 15:23:30,390 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 15:23:36,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2720920.0, ans=0.1 2024-08-14 15:23:36,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2720920.0, ans=0.2 2024-08-14 15:23:47,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11250, loss[loss=0.1142, beats_loss=0.00931, ecapa_loss=0.0001787, whisper_loss=0.1031, over 17237.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01049, ecapa_loss=0.0001576, whisper_loss=0.09171, over 3903166.23 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:23:48,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2721020.0, ans=0.1 2024-08-14 15:24:13,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2721120.0, ans=0.125 2024-08-14 15:24:15,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2721120.0, ans=0.1 2024-08-14 15:24:16,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2721220.0, ans=0.0 2024-08-14 15:24:19,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-14 15:24:54,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2721420.0, ans=0.125 2024-08-14 15:25:08,401 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11300, loss[loss=0.1109, beats_loss=0.01202, ecapa_loss=0.0001143, whisper_loss=0.0977, over 14321.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01052, ecapa_loss=0.0001572, whisper_loss=0.09152, over 3915149.03 frames. ], batch size: 54, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:25:16,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2721520.0, ans=0.09899494936611666 2024-08-14 15:25:21,489 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.316e+01 2.542e+01 2.891e+01 3.051e+02, threshold=5.084e+01, percent-clipped=1.0 2024-08-14 15:25:34,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2721620.0, ans=0.125 2024-08-14 15:25:52,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2024-08-14 15:26:06,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2721820.0, ans=0.0 2024-08-14 15:26:12,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-14 15:26:20,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2721920.0, ans=0.125 2024-08-14 15:26:25,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11350, loss[loss=0.1003, beats_loss=0.01169, ecapa_loss=0.0001579, whisper_loss=0.08707, over 22365.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001565, whisper_loss=0.09141, over 3928593.21 frames. ], batch size: 93, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:27:19,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2024-08-14 15:27:21,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2722320.0, ans=0.0 2024-08-14 15:27:51,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-08-14 15:27:52,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2722420.0, ans=0.125 2024-08-14 15:27:59,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11400, loss[loss=0.0988, beats_loss=0.011, ecapa_loss=0.0001747, whisper_loss=0.08606, over 21918.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.000157, whisper_loss=0.09086, over 3874582.58 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:28:13,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.371e+01 2.609e+01 2.947e+01 4.785e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-14 15:28:23,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2722620.0, ans=0.125 2024-08-14 15:28:34,840 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 15:28:35,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2722720.0, ans=0.1 2024-08-14 15:28:46,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2024-08-14 15:29:00,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2722820.0, ans=0.125 2024-08-14 15:29:15,074 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-14 15:29:31,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11450, loss[loss=0.09276, beats_loss=0.01157, ecapa_loss=0.0001485, whisper_loss=0.07971, over 18953.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001556, whisper_loss=0.08996, over 3893296.21 frames. ], batch size: 75, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:29:37,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2723020.0, ans=0.125 2024-08-14 15:30:27,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2723220.0, ans=0.1 2024-08-14 15:30:28,456 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-14 15:30:53,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2723320.0, ans=0.0 2024-08-14 15:31:12,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2723420.0, ans=0.125 2024-08-14 15:31:30,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11500, loss[loss=0.09264, beats_loss=0.009609, ecapa_loss=0.0001469, whisper_loss=0.08156, over 17141.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01072, ecapa_loss=0.0001555, whisper_loss=0.0898, over 3870443.35 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:31:52,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.370e+01 2.644e+01 2.916e+01 4.086e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-14 15:32:05,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2723620.0, ans=0.125 2024-08-14 15:32:08,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-14 15:32:12,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2723620.0, ans=0.125 2024-08-14 15:32:21,193 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 15:32:38,168 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 15:32:42,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2723820.0, ans=0.125 2024-08-14 15:33:31,510 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11550, loss[loss=0.11, beats_loss=0.01009, ecapa_loss=0.0001615, whisper_loss=0.09829, over 21579.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001545, whisper_loss=0.09004, over 3884932.83 frames. ], batch size: 88, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:33:41,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2724020.0, ans=10.0 2024-08-14 15:33:59,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2724120.0, ans=0.07 2024-08-14 15:34:09,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-14 15:34:14,914 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 15:34:20,294 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 15:34:33,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2724220.0, ans=0.125 2024-08-14 15:34:58,892 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 15:35:08,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2724420.0, ans=0.0 2024-08-14 15:35:09,683 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 15:35:16,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11600, loss[loss=0.1252, beats_loss=0.007589, ecapa_loss=0.0001954, whisper_loss=0.1156, over 20379.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.000155, whisper_loss=0.08956, over 3884980.71 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:35:29,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.402e+01 2.609e+01 2.881e+01 4.573e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-14 15:35:30,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.49 vs. limit=6.0 2024-08-14 15:35:38,105 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 15:35:38,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2724620.0, ans=0.125 2024-08-14 15:35:41,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2724620.0, ans=0.2 2024-08-14 15:35:43,707 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 15:36:04,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2724820.0, ans=0.125 2024-08-14 15:36:28,211 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11650, loss[loss=0.1035, beats_loss=0.01181, ecapa_loss=0.0001811, whisper_loss=0.08991, over 22251.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001553, whisper_loss=0.09007, over 3892353.62 frames. ], batch size: 93, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:36:36,325 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 15:36:40,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2725020.0, ans=0.125 2024-08-14 15:36:41,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2725020.0, ans=0.0 2024-08-14 15:37:06,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2725220.0, ans=0.0 2024-08-14 15:37:06,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2725220.0, ans=0.125 2024-08-14 15:37:08,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2725220.0, ans=0.0 2024-08-14 15:37:11,453 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 15:37:13,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2725320.0, ans=0.125 2024-08-14 15:37:14,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2725320.0, ans=0.2 2024-08-14 15:37:15,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2725320.0, ans=0.0 2024-08-14 15:37:38,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=12.0 2024-08-14 15:37:40,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2725420.0, ans=0.125 2024-08-14 15:37:44,571 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11700, loss[loss=0.1299, beats_loss=0.009535, ecapa_loss=0.0001669, whisper_loss=0.1187, over 23368.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.0001557, whisper_loss=0.0899, over 3923915.96 frames. ], batch size: 93, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:37:46,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2725520.0, ans=0.1 2024-08-14 15:37:59,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.338e+01 2.598e+01 2.950e+01 6.638e+01, threshold=5.196e+01, percent-clipped=2.0 2024-08-14 15:38:05,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2725620.0, ans=0.2 2024-08-14 15:38:10,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2725620.0, ans=0.0 2024-08-14 15:38:14,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-14 15:38:34,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2725820.0, ans=0.1 2024-08-14 15:38:39,571 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 15:38:43,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2725820.0, ans=0.125 2024-08-14 15:38:51,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2725920.0, ans=0.125 2024-08-14 15:39:06,021 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 15:39:10,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2726020.0, ans=0.1 2024-08-14 15:39:10,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-14 15:39:11,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11750, loss[loss=0.09428, beats_loss=0.01212, ecapa_loss=0.0001577, whisper_loss=0.08058, over 22327.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001559, whisper_loss=0.09079, over 3928722.28 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:39:21,496 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 15:39:27,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2024-08-14 15:39:44,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2726220.0, ans=0.0 2024-08-14 15:39:46,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2726220.0, ans=0.0 2024-08-14 15:40:04,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2726320.0, ans=0.1 2024-08-14 15:40:06,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2726320.0, ans=0.0 2024-08-14 15:40:14,190 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 15:40:32,283 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11800, loss[loss=0.1125, beats_loss=0.01159, ecapa_loss=0.0001532, whisper_loss=0.09938, over 20803.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001561, whisper_loss=0.09148, over 3937571.25 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:40:34,102 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 15:40:45,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.511e+01 2.719e+01 3.108e+01 4.014e+02, threshold=5.439e+01, percent-clipped=2.0 2024-08-14 15:40:47,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2726620.0, ans=0.0 2024-08-14 15:40:52,916 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 15:41:00,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2726720.0, ans=0.0 2024-08-14 15:41:03,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2726720.0, ans=0.125 2024-08-14 15:41:11,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2726720.0, ans=0.125 2024-08-14 15:41:18,044 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:41:30,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2726920.0, ans=0.07 2024-08-14 15:41:32,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-08-14 15:41:44,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11850, loss[loss=0.1087, beats_loss=0.008493, ecapa_loss=0.0001632, whisper_loss=0.09857, over 21115.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001563, whisper_loss=0.0914, over 3953423.14 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:41:47,823 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 15:41:57,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2727120.0, ans=0.0 2024-08-14 15:42:17,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2727220.0, ans=0.125 2024-08-14 15:42:26,375 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 15:42:33,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.71 vs. limit=6.0 2024-08-14 15:42:36,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-14 15:42:54,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2727420.0, ans=0.125 2024-08-14 15:42:56,343 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11900, loss[loss=0.09947, beats_loss=0.01017, ecapa_loss=0.0001657, whisper_loss=0.08764, over 18212.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001565, whisper_loss=0.09184, over 3957608.76 frames. ], batch size: 75, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:43:00,755 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 15:43:01,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2727520.0, ans=0.125 2024-08-14 15:43:07,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2727520.0, ans=0.1 2024-08-14 15:43:09,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.301e+01 2.664e+01 2.917e+01 5.181e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-14 15:43:11,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2727620.0, ans=0.125 2024-08-14 15:43:16,181 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 15:43:16,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-14 15:43:19,136 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 15:43:26,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2727720.0, ans=0.1 2024-08-14 15:43:27,365 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 15:43:34,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2727720.0, ans=0.125 2024-08-14 15:43:36,667 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 15:44:09,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 11950, loss[loss=0.1109, beats_loss=0.01085, ecapa_loss=0.0001616, whisper_loss=0.09838, over 22503.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01061, ecapa_loss=0.0001571, whisper_loss=0.0922, over 3949765.60 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:44:10,127 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 15:44:13,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2728020.0, ans=0.125 2024-08-14 15:44:22,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2728020.0, ans=0.0 2024-08-14 15:44:28,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-14 15:44:32,126 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 15:44:36,401 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 15:44:48,268 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 15:45:04,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2728320.0, ans=0.05 2024-08-14 15:45:12,648 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-14 15:45:17,421 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 15:45:23,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12000, loss[loss=0.08959, beats_loss=0.01042, ecapa_loss=0.0001535, whisper_loss=0.07764, over 14242.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001559, whisper_loss=0.09139, over 3920802.07 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:45:23,068 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 15:46:00,604 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.000545, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 15:46:09,966 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4404, 1.7057, 2.2710, 1.1351, 1.3740, 1.6149, 2.2141, 2.0006], device='cuda:2') 2024-08-14 15:46:18,362 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on SV_voxceleb1: loss=0.004271, beats_loss=0, ecapa_loss=0.0004271, whisper_loss=0, over 939242.00 frames. 2024-08-14 15:48:10,157 INFO [train_multi_KD3.py:1149] (2/4) Epoch 19, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 15:48:10,161 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 15:48:23,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.361e+01 2.603e+01 2.893e+01 4.151e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 15:48:39,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2728720.0, ans=0.0 2024-08-14 15:48:46,444 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 15:48:50,977 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 15:48:57,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2728820.0, ans=0.125 2024-08-14 15:48:58,474 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-14 15:48:59,044 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:49:03,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-14 15:49:16,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2728920.0, ans=0.125 2024-08-14 15:49:25,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12050, loss[loss=0.08384, beats_loss=0.01256, ecapa_loss=0.0001251, whisper_loss=0.07003, over 14362.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001552, whisper_loss=0.09096, over 3876957.66 frames. ], batch size: 58, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:49:49,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2729120.0, ans=0.025 2024-08-14 15:49:49,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2729120.0, ans=0.0 2024-08-14 15:49:54,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2729220.0, ans=0.1 2024-08-14 15:50:12,995 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 15:50:13,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-14 15:50:16,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2729320.0, ans=0.0 2024-08-14 15:50:23,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2729420.0, ans=0.0 2024-08-14 15:50:24,855 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 15:50:32,384 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 39 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 15:50:36,907 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.670e-03 2024-08-14 15:50:37,987 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 30 from Vox, 23 fro AS 2024-08-14 15:50:38,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2729520.0, ans=0.1 2024-08-14 15:50:39,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12100, loss[loss=0.08356, beats_loss=0.01021, ecapa_loss=0.0002172, whisper_loss=0.07117, over 15981.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001557, whisper_loss=0.09058, over 3867904.46 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:50:44,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2729520.0, ans=0.0 2024-08-14 15:50:52,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.283e+01 2.551e+01 2.892e+01 3.951e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-14 15:51:01,272 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 15:51:14,365 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 15:51:44,859 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 15:51:48,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-14 15:51:51,843 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12150, loss[loss=0.0952, beats_loss=0.01333, ecapa_loss=0.0001383, whisper_loss=0.08049, over 21897.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001543, whisper_loss=0.09041, over 3907465.58 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:51:54,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-08-14 15:52:08,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-14 15:52:22,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2730220.0, ans=0.2 2024-08-14 15:52:23,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=22.5 2024-08-14 15:52:26,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2730220.0, ans=0.1 2024-08-14 15:52:31,576 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 15:52:38,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2730320.0, ans=0.125 2024-08-14 15:52:39,434 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 15:52:40,750 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 15:52:42,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2730320.0, ans=0.125 2024-08-14 15:52:59,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2730420.0, ans=0.125 2024-08-14 15:53:02,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2730420.0, ans=0.1 2024-08-14 15:53:06,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12200, loss[loss=0.1112, beats_loss=0.01148, ecapa_loss=0.0001436, whisper_loss=0.09831, over 22529.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001539, whisper_loss=0.09085, over 3912266.19 frames. ], batch size: 90, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:53:19,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.397e+01 2.639e+01 2.869e+01 4.830e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 15:53:20,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-14 15:53:38,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2730720.0, ans=0.1 2024-08-14 15:53:45,122 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 15:54:19,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12250, loss[loss=0.1235, beats_loss=0.01055, ecapa_loss=0.0001364, whisper_loss=0.1116, over 22165.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001549, whisper_loss=0.09098, over 3901533.61 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:54:27,423 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 15:54:35,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2731120.0, ans=0.125 2024-08-14 15:54:38,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2731120.0, ans=0.2 2024-08-14 15:54:46,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2731120.0, ans=0.0 2024-08-14 15:54:49,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2731220.0, ans=0.125 2024-08-14 15:55:28,283 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 15:55:31,385 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 15:55:32,677 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12300, loss[loss=0.09633, beats_loss=0.01104, ecapa_loss=0.0001302, whisper_loss=0.08399, over 23113.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001548, whisper_loss=0.09109, over 3919276.28 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:55:36,186 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 15:55:45,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2731520.0, ans=0.125 2024-08-14 15:55:46,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.391e+01 2.726e+01 3.127e+01 1.434e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-14 15:55:52,019 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 15:56:09,982 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 15:56:14,155 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 15:56:19,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2731820.0, ans=0.2 2024-08-14 15:56:42,260 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 15:56:46,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12350, loss[loss=0.09829, beats_loss=0.01177, ecapa_loss=0.0001551, whisper_loss=0.08497, over 19590.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09099, over 3878601.85 frames. ], batch size: 82, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:56:48,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2732020.0, ans=0.125 2024-08-14 15:56:59,352 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 15:57:15,820 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 15:57:17,343 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 15:57:19,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.41 vs. limit=15.0 2024-08-14 15:57:22,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2732220.0, ans=0.0 2024-08-14 15:57:42,373 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 15:57:43,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=8.0 2024-08-14 15:57:48,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2732420.0, ans=0.2 2024-08-14 15:58:00,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12400, loss[loss=0.1084, beats_loss=0.0102, ecapa_loss=0.0001777, whisper_loss=0.09644, over 17177.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001546, whisper_loss=0.09058, over 3875971.56 frames. ], batch size: 70, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:58:13,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2732520.0, ans=0.125 2024-08-14 15:58:13,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.330e+01 2.578e+01 2.980e+01 5.348e+02, threshold=5.156e+01, percent-clipped=2.0 2024-08-14 15:58:28,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2732620.0, ans=0.125 2024-08-14 15:58:40,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-14 15:58:43,391 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-08-14 15:59:01,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2732920.0, ans=0.0 2024-08-14 15:59:14,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12450, loss[loss=0.09389, beats_loss=0.01035, ecapa_loss=0.000188, whisper_loss=0.08166, over 17787.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.000155, whisper_loss=0.09029, over 3884929.51 frames. ], batch size: 71, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:59:17,497 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-14 15:59:42,636 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-08-14 15:59:42,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-14 15:59:45,247 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 15:59:51,112 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-14 16:00:14,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2733420.0, ans=0.1 2024-08-14 16:00:25,153 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 16:00:25,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2733420.0, ans=0.0 2024-08-14 16:00:30,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12500, loss[loss=0.09546, beats_loss=0.01304, ecapa_loss=0.0001317, whisper_loss=0.0811, over 18357.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001551, whisper_loss=0.09033, over 3865904.97 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:00:37,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2733520.0, ans=0.0 2024-08-14 16:00:43,237 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 16:00:45,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.338e+01 2.506e+01 2.817e+01 7.820e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-14 16:00:55,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2733620.0, ans=0.0 2024-08-14 16:01:08,820 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 16:01:20,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=12.0 2024-08-14 16:01:32,252 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-14 16:01:35,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 16:01:35,933 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 16:01:46,190 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12550, loss[loss=0.1221, beats_loss=0.008852, ecapa_loss=0.000152, whisper_loss=0.1117, over 21708.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001546, whisper_loss=0.09021, over 3875231.11 frames. ], batch size: 83, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:01:46,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2734020.0, ans=0.0 2024-08-14 16:01:52,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2734020.0, ans=0.125 2024-08-14 16:02:01,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2734120.0, ans=0.1 2024-08-14 16:02:18,630 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 16:02:32,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2734320.0, ans=0.0 2024-08-14 16:02:43,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2734320.0, ans=0.025 2024-08-14 16:02:45,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2734420.0, ans=0.0 2024-08-14 16:02:51,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2734420.0, ans=0.125 2024-08-14 16:02:55,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2734420.0, ans=0.2 2024-08-14 16:02:55,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-14 16:02:59,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2734520.0, ans=0.125 2024-08-14 16:03:00,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12600, loss[loss=0.09375, beats_loss=0.01107, ecapa_loss=0.0001661, whisper_loss=0.08102, over 22056.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001558, whisper_loss=0.09023, over 3869007.51 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:03:14,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.270e+01 2.592e+01 3.036e+01 4.281e+01, threshold=5.185e+01, percent-clipped=0.0 2024-08-14 16:03:22,110 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 16:03:23,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2734620.0, ans=0.125 2024-08-14 16:03:31,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2734720.0, ans=0.0 2024-08-14 16:03:33,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-14 16:03:42,006 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 16:03:43,607 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 16:03:55,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-14 16:04:07,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2734920.0, ans=0.125 2024-08-14 16:04:14,030 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12650, loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001565, whisper_loss=0.09232, over 17208.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01081, ecapa_loss=0.0001552, whisper_loss=0.08989, over 3867445.50 frames. ], batch size: 70, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:04:43,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-14 16:04:50,398 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 16:04:50,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2735220.0, ans=0.125 2024-08-14 16:04:56,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2735220.0, ans=0.1 2024-08-14 16:05:02,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2735320.0, ans=0.125 2024-08-14 16:05:04,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-14 16:05:07,731 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 16:05:18,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2735420.0, ans=0.125 2024-08-14 16:05:18,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2735420.0, ans=0.125 2024-08-14 16:05:23,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2735420.0, ans=0.0 2024-08-14 16:05:25,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-14 16:05:27,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12700, loss[loss=0.1036, beats_loss=0.01213, ecapa_loss=0.0001561, whisper_loss=0.08995, over 21886.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001547, whisper_loss=0.09093, over 3869068.97 frames. ], batch size: 88, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:05:30,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-08-14 16:05:39,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2735520.0, ans=0.1 2024-08-14 16:05:40,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-14 16:05:42,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.369e+01 2.524e+01 2.927e+01 4.569e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 16:05:55,781 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 16:06:05,675 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 16:06:09,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-14 16:06:10,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2735820.0, ans=0.1 2024-08-14 16:06:15,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2735820.0, ans=0.125 2024-08-14 16:06:21,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2735820.0, ans=0.125 2024-08-14 16:06:35,418 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 16:06:41,531 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12750, loss[loss=0.1122, beats_loss=0.009946, ecapa_loss=0.00016, whisper_loss=0.1007, over 22668.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001541, whisper_loss=0.09134, over 3884826.60 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:06:43,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736020.0, ans=0.1 2024-08-14 16:06:47,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2736020.0, ans=0.125 2024-08-14 16:06:52,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2736020.0, ans=0.0 2024-08-14 16:07:10,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736220.0, ans=0.1 2024-08-14 16:07:16,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2024-08-14 16:07:17,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736220.0, ans=0.1 2024-08-14 16:07:19,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2736220.0, ans=0.125 2024-08-14 16:07:32,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2736320.0, ans=0.125 2024-08-14 16:07:55,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12800, loss[loss=0.07998, beats_loss=0.01372, ecapa_loss=0.0001503, whisper_loss=0.06476, over 18573.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001549, whisper_loss=0.09178, over 3872101.85 frames. ], batch size: 79, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:07:55,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2024-08-14 16:08:09,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.300e+01 2.515e+01 2.756e+01 3.404e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-14 16:08:25,957 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 16:09:01,721 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 16:09:01,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2736920.0, ans=0.125 2024-08-14 16:09:09,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12850, loss[loss=0.1151, beats_loss=0.01185, ecapa_loss=0.0001581, whisper_loss=0.1016, over 22214.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.09222, over 3898238.87 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:09:21,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2737020.0, ans=0.125 2024-08-14 16:09:24,069 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-14 16:09:34,484 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 16:09:42,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2737220.0, ans=0.0 2024-08-14 16:09:46,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-14 16:09:56,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2737320.0, ans=0.2 2024-08-14 16:09:58,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2737320.0, ans=0.125 2024-08-14 16:10:10,112 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 16:10:19,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2737420.0, ans=0.2 2024-08-14 16:10:21,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12900, loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.0001422, whisper_loss=0.08999, over 17281.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01072, ecapa_loss=0.0001554, whisper_loss=0.09163, over 3851907.06 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:10:21,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2737520.0, ans=0.125 2024-08-14 16:10:27,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2737520.0, ans=0.125 2024-08-14 16:10:34,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2737620.0, ans=0.2 2024-08-14 16:10:35,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.212e+01 2.559e+01 2.809e+01 4.062e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-14 16:10:40,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2737620.0, ans=0.125 2024-08-14 16:10:40,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2737620.0, ans=0.1 2024-08-14 16:11:00,647 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 16:11:16,004 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 16:11:26,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2737920.0, ans=0.125 2024-08-14 16:11:33,424 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 16:11:34,554 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 12950, loss[loss=0.09523, beats_loss=0.009775, ecapa_loss=0.0002153, whisper_loss=0.0833, over 12688.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001539, whisper_loss=0.09124, over 3849905.51 frames. ], batch size: 55, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:11:39,114 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 16:11:39,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738020.0, ans=0.1 2024-08-14 16:11:54,579 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 16:12:02,873 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 16:12:22,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2738320.0, ans=0.125 2024-08-14 16:12:36,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2738420.0, ans=0.2 2024-08-14 16:12:49,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13000, loss[loss=0.09691, beats_loss=0.01074, ecapa_loss=0.0001449, whisper_loss=0.08473, over 19750.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001536, whisper_loss=0.09111, over 3887931.95 frames. ], batch size: 74, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:12:54,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2738520.0, ans=0.0 2024-08-14 16:13:04,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2738620.0, ans=0.0 2024-08-14 16:13:04,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.365e+01 2.543e+01 2.775e+01 1.627e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-14 16:13:10,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2738620.0, ans=0.125 2024-08-14 16:13:13,848 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 16:13:24,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2738720.0, ans=0.125 2024-08-14 16:13:26,318 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 16:13:27,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2738720.0, ans=0.125 2024-08-14 16:14:05,333 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13050, loss[loss=0.1358, beats_loss=0.007892, ecapa_loss=0.0001644, whisper_loss=0.1262, over 17444.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001546, whisper_loss=0.09127, over 3856064.08 frames. ], batch size: 68, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:14:13,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2739020.0, ans=0.07 2024-08-14 16:14:41,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2739220.0, ans=0.2 2024-08-14 16:14:46,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2739220.0, ans=0.125 2024-08-14 16:14:50,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2739320.0, ans=0.125 2024-08-14 16:14:50,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.26 vs. limit=5.0 2024-08-14 16:14:58,977 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 16:15:00,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2739320.0, ans=0.125 2024-08-14 16:15:10,373 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 16:15:13,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2739420.0, ans=0.0 2024-08-14 16:15:16,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2739420.0, ans=0.125 2024-08-14 16:15:18,524 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13100, loss[loss=0.1055, beats_loss=0.009805, ecapa_loss=0.0001201, whisper_loss=0.09454, over 16675.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.09121, over 3861431.89 frames. ], batch size: 62, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:15:23,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2739520.0, ans=0.0 2024-08-14 16:15:33,715 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.291e+01 2.498e+01 2.880e+01 4.346e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-14 16:15:34,043 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 16:15:35,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2739620.0, ans=0.0 2024-08-14 16:15:42,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-14 16:15:51,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2739720.0, ans=0.2 2024-08-14 16:16:14,008 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 16:16:14,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2739820.0, ans=0.1 2024-08-14 16:16:23,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2739920.0, ans=0.2 2024-08-14 16:16:30,543 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 16:16:33,367 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13150, loss[loss=0.1147, beats_loss=0.00892, ecapa_loss=0.0001427, whisper_loss=0.1044, over 15399.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001529, whisper_loss=0.09127, over 3881950.74 frames. ], batch size: 57, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:16:37,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2740020.0, ans=0.1 2024-08-14 16:16:51,949 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 16:17:28,582 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 16:17:39,195 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 16:17:40,578 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 16:17:47,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13200, loss[loss=0.09141, beats_loss=0.01362, ecapa_loss=0.0001149, whisper_loss=0.07664, over 22177.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001531, whisper_loss=0.09087, over 3871823.39 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:17:52,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2740520.0, ans=0.125 2024-08-14 16:17:58,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2740520.0, ans=0.0 2024-08-14 16:18:02,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.404e+01 2.825e+01 3.249e+01 1.605e+02, threshold=5.649e+01, percent-clipped=1.0 2024-08-14 16:18:05,812 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 32 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 16:18:19,245 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-08-14 16:18:33,373 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 16:18:37,600 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 16:19:00,926 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13250, loss[loss=0.09415, beats_loss=0.009527, ecapa_loss=0.0001585, whisper_loss=0.08304, over 15173.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001529, whisper_loss=0.09181, over 3887481.88 frames. ], batch size: 61, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:19:13,788 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 16:19:29,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=15.0 2024-08-14 16:19:29,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2741220.0, ans=0.1 2024-08-14 16:19:33,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2741220.0, ans=0.125 2024-08-14 16:19:33,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2024-08-14 16:19:47,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2741320.0, ans=0.95 2024-08-14 16:19:53,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-14 16:20:12,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13300, loss[loss=0.1176, beats_loss=0.01148, ecapa_loss=0.0001465, whisper_loss=0.1047, over 16030.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01063, ecapa_loss=0.0001528, whisper_loss=0.09192, over 3908733.65 frames. ], batch size: 62, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:20:22,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2741520.0, ans=0.1 2024-08-14 16:20:24,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=22.5 2024-08-14 16:20:27,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2741620.0, ans=0.125 2024-08-14 16:20:27,896 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.357e+01 2.636e+01 2.927e+01 4.489e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:20:43,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741720.0, ans=0.1 2024-08-14 16:20:57,792 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.172e-01 2024-08-14 16:20:58,748 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 16:21:04,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2024-08-14 16:21:26,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13350, loss[loss=0.111, beats_loss=0.008956, ecapa_loss=0.0001588, whisper_loss=0.1005, over 19922.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.000154, whisper_loss=0.09198, over 3886918.18 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:21:30,144 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-14 16:21:30,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2742020.0, ans=0.2 2024-08-14 16:21:40,370 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-14 16:21:43,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2742120.0, ans=0.2 2024-08-14 16:21:45,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2742120.0, ans=0.125 2024-08-14 16:21:46,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-14 16:22:03,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2742220.0, ans=15.0 2024-08-14 16:22:09,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2742220.0, ans=0.0 2024-08-14 16:22:16,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-14 16:22:17,673 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 16:22:38,157 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 16:22:38,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-14 16:22:40,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13400, loss[loss=0.09257, beats_loss=0.01195, ecapa_loss=0.0001427, whisper_loss=0.07919, over 16522.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001547, whisper_loss=0.09192, over 3906048.15 frames. ], batch size: 66, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:22:41,494 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 16:22:55,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2742620.0, ans=0.125 2024-08-14 16:22:55,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.422e+01 2.683e+01 3.045e+01 1.877e+02, threshold=5.367e+01, percent-clipped=2.0 2024-08-14 16:23:08,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-14 16:23:20,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2742720.0, ans=0.125 2024-08-14 16:23:21,500 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 16:23:31,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2742820.0, ans=0.125 2024-08-14 16:23:31,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2742820.0, ans=0.125 2024-08-14 16:23:40,234 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 16:23:54,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-14 16:23:54,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13450, loss[loss=0.1105, beats_loss=0.008407, ecapa_loss=0.000136, whisper_loss=0.1008, over 18260.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001549, whisper_loss=0.09185, over 3896534.70 frames. ], batch size: 71, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:24:02,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2743020.0, ans=0.125 2024-08-14 16:24:09,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2743120.0, ans=0.125 2024-08-14 16:24:11,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2743120.0, ans=0.0 2024-08-14 16:24:42,492 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 16:24:45,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2743320.0, ans=0.125 2024-08-14 16:24:45,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2024-08-14 16:24:49,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-14 16:24:51,661 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 16:24:54,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=12.0 2024-08-14 16:24:57,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2743420.0, ans=0.0 2024-08-14 16:25:08,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13500, loss[loss=0.06792, beats_loss=0.01349, ecapa_loss=0.0001452, whisper_loss=0.05298, over 18685.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001562, whisper_loss=0.09103, over 3870754.96 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:25:12,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2743520.0, ans=0.05 2024-08-14 16:25:18,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2743520.0, ans=0.125 2024-08-14 16:25:23,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.281e+01 2.536e+01 2.815e+01 4.454e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-14 16:25:25,082 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 16:25:35,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2743620.0, ans=0.07 2024-08-14 16:25:36,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2743720.0, ans=0.125 2024-08-14 16:25:43,981 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 16:25:44,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2743720.0, ans=0.2 2024-08-14 16:26:04,677 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 16:26:05,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2743820.0, ans=0.0 2024-08-14 16:26:14,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2024-08-14 16:26:22,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13550, loss[loss=0.09356, beats_loss=0.01086, ecapa_loss=0.0001802, whisper_loss=0.08089, over 17548.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001556, whisper_loss=0.09058, over 3879428.78 frames. ], batch size: 69, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:26:25,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2744020.0, ans=0.0 2024-08-14 16:26:29,985 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 23 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-14 16:26:32,815 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-14 16:26:41,372 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 16:26:52,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2744220.0, ans=0.125 2024-08-14 16:26:54,655 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 16:26:56,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2744220.0, ans=0.125 2024-08-14 16:27:01,920 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 16:27:05,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2744320.0, ans=0.09899494936611666 2024-08-14 16:27:12,125 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 16:27:15,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2744320.0, ans=0.125 2024-08-14 16:27:16,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2744320.0, ans=0.125 2024-08-14 16:27:30,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2744420.0, ans=0.0 2024-08-14 16:27:34,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13600, loss[loss=0.1014, beats_loss=0.01187, ecapa_loss=0.0001498, whisper_loss=0.088, over 22554.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001545, whisper_loss=0.09127, over 3892247.06 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:27:49,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.289e+01 2.556e+01 2.921e+01 4.683e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-14 16:27:55,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2744620.0, ans=0.0 2024-08-14 16:27:57,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2744620.0, ans=0.0 2024-08-14 16:27:59,705 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-14 16:28:01,061 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 16:28:08,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2744720.0, ans=0.125 2024-08-14 16:28:39,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-14 16:28:40,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2744920.0, ans=0.125 2024-08-14 16:28:42,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2744920.0, ans=0.125 2024-08-14 16:28:48,840 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13650, loss[loss=0.1103, beats_loss=0.01115, ecapa_loss=0.0001497, whisper_loss=0.09766, over 20090.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001539, whisper_loss=0.09127, over 3906581.19 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:28:50,578 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 16:28:55,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2745020.0, ans=0.025 2024-08-14 16:29:02,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2745120.0, ans=0.0 2024-08-14 16:29:08,379 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 16:29:15,464 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 16:29:15,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2745120.0, ans=0.125 2024-08-14 16:29:36,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2745320.0, ans=0.125 2024-08-14 16:29:38,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2745320.0, ans=0.125 2024-08-14 16:29:41,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2745320.0, ans=0.1 2024-08-14 16:29:45,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2745320.0, ans=0.0 2024-08-14 16:29:47,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2745420.0, ans=0.0 2024-08-14 16:30:02,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13700, loss[loss=0.1154, beats_loss=0.01155, ecapa_loss=0.0001485, whisper_loss=0.1024, over 18071.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001549, whisper_loss=0.091, over 3867898.01 frames. ], batch size: 70, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:30:16,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.313e+01 2.534e+01 2.793e+01 4.098e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 16:30:23,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2745620.0, ans=0.1 2024-08-14 16:30:34,423 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 28 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 16:30:36,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2745720.0, ans=0.0 2024-08-14 16:30:41,332 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 16:30:41,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2745720.0, ans=0.1 2024-08-14 16:30:46,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2024-08-14 16:30:58,868 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 16:31:00,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-14 16:31:01,538 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 16:31:02,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2745920.0, ans=0.1 2024-08-14 16:31:03,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2745920.0, ans=0.125 2024-08-14 16:31:04,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2745920.0, ans=0.125 2024-08-14 16:31:08,996 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 16:31:11,998 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 16:31:14,668 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13750, loss[loss=0.1104, beats_loss=0.00986, ecapa_loss=0.0001716, whisper_loss=0.0988, over 23236.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001546, whisper_loss=0.09056, over 3889919.90 frames. ], batch size: 94, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:31:16,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2746020.0, ans=0.0 2024-08-14 16:31:26,571 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 16:31:28,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2746120.0, ans=0.125 2024-08-14 16:31:44,265 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 16:31:50,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-14 16:31:53,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2746220.0, ans=0.125 2024-08-14 16:31:56,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2746220.0, ans=0.0 2024-08-14 16:31:58,902 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 16:32:03,154 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 16:32:03,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2746320.0, ans=0.2 2024-08-14 16:32:16,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2024-08-14 16:32:17,346 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 16:32:25,980 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 16:32:28,913 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13800, loss[loss=0.1058, beats_loss=0.0127, ecapa_loss=0.0001083, whisper_loss=0.09198, over 23480.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001538, whisper_loss=0.09046, over 3917145.42 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:32:40,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-14 16:32:44,543 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 16:32:45,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.356e+01 2.629e+01 2.983e+01 1.767e+02, threshold=5.258e+01, percent-clipped=3.0 2024-08-14 16:32:55,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2746620.0, ans=0.0 2024-08-14 16:33:00,843 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 16:33:04,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2746720.0, ans=0.0 2024-08-14 16:33:36,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2746920.0, ans=0.125 2024-08-14 16:33:43,053 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13850, loss[loss=0.1073, beats_loss=0.01049, ecapa_loss=0.0001504, whisper_loss=0.09527, over 22633.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001562, whisper_loss=0.09107, over 3914833.90 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:33:49,274 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 16:34:20,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2747220.0, ans=0.0 2024-08-14 16:34:34,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2747320.0, ans=0.125 2024-08-14 16:34:49,452 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 16:34:56,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13900, loss[loss=0.117, beats_loss=0.01051, ecapa_loss=0.0001575, whisper_loss=0.1049, over 22854.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001557, whisper_loss=0.09163, over 3926176.72 frames. ], batch size: 89, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:35:12,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.429e+01 2.660e+01 3.144e+01 1.636e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-14 16:35:14,460 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:35:14,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2747620.0, ans=0.125 2024-08-14 16:35:14,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2024-08-14 16:35:17,131 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 16:35:28,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2747720.0, ans=0.125 2024-08-14 16:35:34,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-14 16:35:35,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2747720.0, ans=0.125 2024-08-14 16:35:41,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-14 16:35:59,964 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 16:36:05,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-08-14 16:36:09,805 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 13950, loss[loss=0.1067, beats_loss=0.01269, ecapa_loss=9.781e-05, whisper_loss=0.09299, over 22407.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001561, whisper_loss=0.09058, over 3870845.03 frames. ], batch size: 85, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:36:40,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2748220.0, ans=10.0 2024-08-14 16:36:48,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2748220.0, ans=0.125 2024-08-14 16:36:58,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2748320.0, ans=0.125 2024-08-14 16:37:03,861 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 16:37:15,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2024-08-14 16:37:22,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14000, loss[loss=0.1224, beats_loss=0.01044, ecapa_loss=0.0001024, whisper_loss=0.1109, over 24834.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.000155, whisper_loss=0.09039, over 3859160.89 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:37:22,507 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 16:37:30,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=12.0 2024-08-14 16:37:31,384 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 16:37:38,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.629e+01 3.019e+01 1.116e+02, threshold=5.259e+01, percent-clipped=1.0 2024-08-14 16:37:48,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748620.0, ans=0.1 2024-08-14 16:37:50,961 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 16:37:52,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=15.0 2024-08-14 16:38:01,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2748720.0, ans=0.125 2024-08-14 16:38:04,482 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 16:38:36,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14050, loss[loss=0.1361, beats_loss=0.00782, ecapa_loss=0.0001253, whisper_loss=0.127, over 19164.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001543, whisper_loss=0.09114, over 3861978.91 frames. ], batch size: 70, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:38:37,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=15.0 2024-08-14 16:38:46,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2749020.0, ans=0.125 2024-08-14 16:38:59,612 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 16:39:02,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-08-14 16:39:20,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2749320.0, ans=0.1 2024-08-14 16:39:28,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2749320.0, ans=0.0 2024-08-14 16:39:43,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-14 16:39:46,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2749420.0, ans=0.125 2024-08-14 16:39:49,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2749520.0, ans=0.5 2024-08-14 16:39:50,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14100, loss[loss=0.1053, beats_loss=0.01211, ecapa_loss=0.0001401, whisper_loss=0.09182, over 19117.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.0001536, whisper_loss=0.09179, over 3879282.55 frames. ], batch size: 75, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:40:00,431 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 16:40:06,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.359e+01 2.545e+01 2.723e+01 7.272e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-14 16:40:14,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2749620.0, ans=0.1 2024-08-14 16:40:34,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2749820.0, ans=0.125 2024-08-14 16:40:36,022 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 16:40:39,091 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 16:40:45,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2749820.0, ans=0.125 2024-08-14 16:40:52,555 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 16:40:55,663 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 16:41:04,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14150, loss[loss=0.09751, beats_loss=0.009264, ecapa_loss=0.0001748, whisper_loss=0.0865, over 20022.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.09103, over 3871206.78 frames. ], batch size: 82, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:41:11,231 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-08-14 16:41:43,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2750220.0, ans=0.125 2024-08-14 16:41:43,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2750220.0, ans=0.0 2024-08-14 16:41:54,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=22.5 2024-08-14 16:41:58,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2750320.0, ans=0.0 2024-08-14 16:41:59,850 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-14 16:42:00,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2750320.0, ans=0.1 2024-08-14 16:42:15,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2750420.0, ans=0.1 2024-08-14 16:42:18,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14200, loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0001664, whisper_loss=0.09339, over 22345.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001539, whisper_loss=0.09041, over 3883572.64 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:42:18,343 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 16:42:20,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2750520.0, ans=0.0 2024-08-14 16:42:29,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2750520.0, ans=0.125 2024-08-14 16:42:33,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2750620.0, ans=0.125 2024-08-14 16:42:34,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.417e+01 2.674e+01 2.957e+01 3.053e+02, threshold=5.348e+01, percent-clipped=2.0 2024-08-14 16:42:41,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-14 16:42:44,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2750620.0, ans=0.0 2024-08-14 16:42:44,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2750620.0, ans=0.0 2024-08-14 16:42:47,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2750720.0, ans=0.2 2024-08-14 16:42:55,590 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 17 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-14 16:43:03,414 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 16:43:06,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2750820.0, ans=0.2 2024-08-14 16:43:07,817 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 16:43:16,946 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 16:43:19,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2750920.0, ans=0.125 2024-08-14 16:43:32,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14250, loss[loss=0.1012, beats_loss=0.008777, ecapa_loss=0.0001633, whisper_loss=0.09075, over 14010.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001533, whisper_loss=0.09022, over 3909576.32 frames. ], batch size: 53, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:43:37,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2751020.0, ans=0.2 2024-08-14 16:43:39,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2751020.0, ans=15.0 2024-08-14 16:43:45,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2751020.0, ans=0.125 2024-08-14 16:43:46,397 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 16:43:46,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2751120.0, ans=0.0 2024-08-14 16:43:57,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2751120.0, ans=0.1 2024-08-14 16:43:58,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2751120.0, ans=0.125 2024-08-14 16:44:20,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-08-14 16:44:23,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=22.5 2024-08-14 16:44:33,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2751420.0, ans=0.125 2024-08-14 16:44:42,715 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 31 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-14 16:44:45,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14300, loss[loss=0.09174, beats_loss=0.01079, ecapa_loss=0.0002129, whisper_loss=0.07883, over 21136.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001532, whisper_loss=0.09027, over 3916483.50 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:44:46,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2751520.0, ans=0.125 2024-08-14 16:44:48,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2751520.0, ans=0.0 2024-08-14 16:45:02,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.444e+01 2.637e+01 2.966e+01 4.430e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-14 16:45:59,939 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14350, loss[loss=0.1203, beats_loss=0.009571, ecapa_loss=0.0001591, whisper_loss=0.1092, over 22864.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001528, whisper_loss=0.09012, over 3913938.51 frames. ], batch size: 89, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:46:00,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2752020.0, ans=0.125 2024-08-14 16:46:01,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2752020.0, ans=0.0 2024-08-14 16:46:08,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2752020.0, ans=0.1 2024-08-14 16:46:11,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2024-08-14 16:46:15,678 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 16:46:20,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2752120.0, ans=0.125 2024-08-14 16:46:34,091 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 16:47:10,253 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 16:47:15,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-08-14 16:47:16,528 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14400, loss[loss=0.1155, beats_loss=0.01148, ecapa_loss=0.0001557, whisper_loss=0.1025, over 22907.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001534, whisper_loss=0.0908, over 3948532.40 frames. ], batch size: 93, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:47:20,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2752520.0, ans=0.125 2024-08-14 16:47:33,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2752620.0, ans=0.1 2024-08-14 16:47:34,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.337e+01 2.636e+01 2.855e+01 4.364e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:47:51,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2752720.0, ans=0.125 2024-08-14 16:47:57,355 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 16:48:00,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752720.0, ans=0.1 2024-08-14 16:48:24,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-08-14 16:48:34,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 19, batch 14450, loss[loss=0.1099, beats_loss=0.01285, ecapa_loss=0.0001286, whisper_loss=0.0958, over 23848.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.09056, over 3940848.39 frames. ], batch size: 95, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:48:41,800 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 16:48:52,040 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 16:48:56,242 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 16:48:56,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-14 16:49:03,392 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 16:49:06,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2753220.0, ans=0.125 2024-08-14 16:49:23,470 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 16:50:13,406 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 0, loss[loss=0.09586, beats_loss=0.01065, ecapa_loss=0.0001259, whisper_loss=0.08395, over 20502.00 frames. ], tot_loss[loss=0.09586, beats_loss=0.01065, ecapa_loss=0.0001259, whisper_loss=0.08395, over 20502.00 frames. ], batch size: 77, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:50:13,407 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 16:50:50,436 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005431, whisper_loss=0.2478, over 922467.00 frames. 2024-08-14 16:51:02,447 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3425, 5.1746, 4.3177, 4.5097], device='cuda:2') 2024-08-14 16:51:07,210 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on SV_voxceleb1: loss=0.004351, beats_loss=0, ecapa_loss=0.0004351, whisper_loss=0, over 939242.00 frames. 2024-08-14 16:52:53,152 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on AT_audioset: loss=0.02356, beats_loss=0.02356, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 16:52:53,155 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 16:53:32,205 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 16:53:46,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.322e+01 2.623e+01 2.945e+01 5.325e+01, threshold=5.246e+01, percent-clipped=1.0 2024-08-14 16:54:17,462 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 16:54:32,315 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 16:54:50,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2753820.0, ans=0.125 2024-08-14 16:54:53,006 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 16:54:56,907 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 50, loss[loss=0.09658, beats_loss=0.009971, ecapa_loss=0.0001585, whisper_loss=0.08502, over 17445.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009063, ecapa_loss=0.0001616, whisper_loss=0.09021, over 849977.61 frames. ], batch size: 72, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:55:17,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2753920.0, ans=0.125 2024-08-14 16:55:58,937 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 16:56:01,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=15.0 2024-08-14 16:56:12,104 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 16:56:23,640 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 16:56:33,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2754320.0, ans=0.2 2024-08-14 16:56:51,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2754420.0, ans=0.2 2024-08-14 16:56:52,174 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 100, loss[loss=0.1065, beats_loss=0.01086, ecapa_loss=0.0001149, whisper_loss=0.09448, over 18525.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.009386, ecapa_loss=0.0001575, whisper_loss=0.09123, over 1521434.08 frames. ], batch size: 70, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:56:56,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2754420.0, ans=0.0 2024-08-14 16:57:15,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2754520.0, ans=0.09899494936611666 2024-08-14 16:57:33,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2024-08-14 16:57:38,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.579e+01 2.856e+01 3.069e+01 3.660e+02, threshold=5.711e+01, percent-clipped=1.0 2024-08-14 16:57:40,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2754620.0, ans=0.0 2024-08-14 16:58:04,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2754720.0, ans=0.1 2024-08-14 16:58:10,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2754720.0, ans=0.125 2024-08-14 16:58:14,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2754720.0, ans=0.2 2024-08-14 16:58:24,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2754820.0, ans=0.0 2024-08-14 16:58:26,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-14 16:58:33,709 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 16:58:35,241 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 16:58:38,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 150, loss[loss=0.1159, beats_loss=0.008832, ecapa_loss=0.0001338, whisper_loss=0.1057, over 18814.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.009518, ecapa_loss=0.000159, whisper_loss=0.09156, over 2027100.66 frames. ], batch size: 71, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:59:05,141 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 16:59:42,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2755220.0, ans=0.125 2024-08-14 16:59:49,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2755320.0, ans=0.125 2024-08-14 16:59:54,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2755320.0, ans=0.125 2024-08-14 16:59:56,654 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-14 17:00:03,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 200, loss[loss=0.09228, beats_loss=0.009201, ecapa_loss=0.000145, whisper_loss=0.08163, over 15264.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.009737, ecapa_loss=0.0001594, whisper_loss=0.0918, over 2406626.70 frames. ], batch size: 59, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:00:13,959 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 17:00:19,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2755520.0, ans=0.2 2024-08-14 17:00:33,403 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 17:00:36,635 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.515e+01 2.860e+01 3.195e+01 5.864e+01, threshold=5.719e+01, percent-clipped=1.0 2024-08-14 17:00:40,010 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 43 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 17:00:52,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2024-08-14 17:00:54,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755720.0, ans=0.1 2024-08-14 17:01:00,346 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 17:01:04,429 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 17:01:07,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2755820.0, ans=0.0 2024-08-14 17:01:14,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2755820.0, ans=0.2 2024-08-14 17:01:14,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-14 17:01:18,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 250, loss[loss=0.1153, beats_loss=0.01094, ecapa_loss=0.0001703, whisper_loss=0.1026, over 19749.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.009905, ecapa_loss=0.0001585, whisper_loss=0.09277, over 2734225.35 frames. ], batch size: 76, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:01:33,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2756020.0, ans=0.0 2024-08-14 17:01:34,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2756020.0, ans=0.1 2024-08-14 17:01:39,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2756020.0, ans=0.2 2024-08-14 17:01:43,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2756020.0, ans=0.125 2024-08-14 17:01:49,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2756120.0, ans=0.1 2024-08-14 17:01:56,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-14 17:02:15,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2756320.0, ans=0.0 2024-08-14 17:02:30,035 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 300, loss[loss=0.1024, beats_loss=0.01087, ecapa_loss=0.0001563, whisper_loss=0.09, over 16370.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01016, ecapa_loss=0.000158, whisper_loss=0.09179, over 2977452.85 frames. ], batch size: 66, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:02:50,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2756520.0, ans=0.035 2024-08-14 17:02:59,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-14 17:03:00,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.324e+01 2.572e+01 2.876e+01 1.018e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-14 17:03:28,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.84 vs. limit=10.0 2024-08-14 17:03:41,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 350, loss[loss=0.1141, beats_loss=0.009598, ecapa_loss=0.0001962, whisper_loss=0.1025, over 20609.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01025, ecapa_loss=0.0001571, whisper_loss=0.09144, over 3163282.80 frames. ], batch size: 88, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:03:47,922 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 17:04:10,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2757120.0, ans=0.0 2024-08-14 17:04:11,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-14 17:04:20,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2757120.0, ans=0.2 2024-08-14 17:04:50,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2757320.0, ans=0.0 2024-08-14 17:04:51,668 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 17:04:52,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 400, loss[loss=0.1121, beats_loss=0.01043, ecapa_loss=0.0001563, whisper_loss=0.1001, over 20873.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01028, ecapa_loss=0.0001562, whisper_loss=0.09073, over 3291985.62 frames. ], batch size: 80, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:05:04,784 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 17:05:05,939 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 17:05:09,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2757520.0, ans=0.0 2024-08-14 17:05:09,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2757520.0, ans=0.125 2024-08-14 17:05:09,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2757520.0, ans=10.0 2024-08-14 17:05:15,582 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 17:05:23,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.550e+01 2.888e+01 2.244e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 17:05:24,753 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 17:05:31,385 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 17:05:37,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2757720.0, ans=0.0 2024-08-14 17:05:54,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2757820.0, ans=0.125 2024-08-14 17:06:07,297 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 450, loss[loss=0.1023, beats_loss=0.01125, ecapa_loss=0.0001368, whisper_loss=0.08968, over 20856.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01024, ecapa_loss=0.0001565, whisper_loss=0.09133, over 3408947.12 frames. ], batch size: 80, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:06:09,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2757920.0, ans=0.2 2024-08-14 17:06:18,601 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.503e+05 2024-08-14 17:06:22,489 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 17:06:24,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2758020.0, ans=0.0 2024-08-14 17:06:28,956 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 17:06:31,767 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 17:06:33,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2758020.0, ans=0.0 2024-08-14 17:06:37,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-08-14 17:07:08,270 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 17:07:11,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2758320.0, ans=0.125 2024-08-14 17:07:29,573 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 500, loss[loss=0.09433, beats_loss=0.01046, ecapa_loss=0.0001375, whisper_loss=0.08249, over 20253.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01031, ecapa_loss=0.0001555, whisper_loss=0.0905, over 3496006.66 frames. ], batch size: 79, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:07:29,768 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 17:07:51,421 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 17:08:03,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.300e+01 2.536e+01 2.836e+01 8.494e+01, threshold=5.071e+01, percent-clipped=3.0 2024-08-14 17:08:06,532 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 17:08:15,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2758620.0, ans=0.125 2024-08-14 17:08:25,465 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 33 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 17:08:39,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2758820.0, ans=0.0 2024-08-14 17:08:51,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 550, loss[loss=0.0895, beats_loss=0.01014, ecapa_loss=0.0001885, whisper_loss=0.07747, over 19089.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001551, whisper_loss=0.09024, over 3547064.18 frames. ], batch size: 79, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:08:56,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-14 17:09:00,248 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 17:09:07,601 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 17:09:16,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2759020.0, ans=0.2 2024-08-14 17:09:23,434 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 17:09:25,378 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 17:09:54,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2759220.0, ans=0.0 2024-08-14 17:10:01,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759320.0, ans=0.1 2024-08-14 17:10:05,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2759320.0, ans=0.2 2024-08-14 17:10:05,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2759320.0, ans=0.125 2024-08-14 17:10:18,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 600, loss[loss=0.08938, beats_loss=0.0127, ecapa_loss=0.0001481, whisper_loss=0.07519, over 17388.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001545, whisper_loss=0.09059, over 3611041.36 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:10:26,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2759420.0, ans=0.0 2024-08-14 17:10:27,857 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 17:10:46,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2759520.0, ans=0.0 2024-08-14 17:10:46,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2759520.0, ans=0.2 2024-08-14 17:10:48,185 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 17:10:55,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.286e+01 2.611e+01 2.966e+01 2.824e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 17:10:56,029 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 17:11:06,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2759620.0, ans=0.125 2024-08-14 17:11:06,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2759620.0, ans=0.125 2024-08-14 17:11:12,040 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:11:31,032 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 17:11:32,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2759820.0, ans=0.2 2024-08-14 17:11:37,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2024-08-14 17:11:45,490 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 650, loss[loss=0.09196, beats_loss=0.01002, ecapa_loss=0.0001426, whisper_loss=0.08051, over 21481.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001546, whisper_loss=0.0904, over 3679915.43 frames. ], batch size: 81, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:11:54,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-14 17:12:17,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2024-08-14 17:12:27,805 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 17:12:32,381 WARNING [optim.py:496] (2/4) Scaling gradients by 0.059259023517370224, model_norm_threshold=52.210243225097656 2024-08-14 17:12:32,545 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.800e+04, grad_sumsq=2.525e+04, orig_rms_sq=3.485e+00 2024-08-14 17:12:50,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2760220.0, ans=0.1 2024-08-14 17:12:55,242 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 17:13:03,499 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 17:13:05,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2760320.0, ans=0.125 2024-08-14 17:13:11,491 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 700, loss[loss=0.1072, beats_loss=0.01111, ecapa_loss=0.0001561, whisper_loss=0.09453, over 15693.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001536, whisper_loss=0.09036, over 3698740.19 frames. ], batch size: 63, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:13:43,871 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 17:13:46,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.378e+01 2.624e+01 2.914e+01 8.811e+02, threshold=5.248e+01, percent-clipped=3.0 2024-08-14 17:13:50,614 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 17:14:17,498 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 17:14:21,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2760820.0, ans=0.2 2024-08-14 17:14:36,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 750, loss[loss=0.1052, beats_loss=0.006974, ecapa_loss=0.0001618, whisper_loss=0.09661, over 16626.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001516, whisper_loss=0.0901, over 3739953.22 frames. ], batch size: 63, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:14:53,185 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 17:15:05,058 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 17:15:35,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2761220.0, ans=0.0 2024-08-14 17:15:35,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2761220.0, ans=0.0 2024-08-14 17:15:52,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2761320.0, ans=0.0 2024-08-14 17:16:00,663 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 800, loss[loss=0.08747, beats_loss=0.009906, ecapa_loss=0.0002288, whisper_loss=0.07528, over 20256.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001519, whisper_loss=0.08911, over 3750729.78 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:16:05,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2761420.0, ans=10.0 2024-08-14 17:16:07,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2761420.0, ans=0.125 2024-08-14 17:16:23,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-14 17:16:33,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.284e+01 2.457e+01 2.814e+01 4.816e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-14 17:16:44,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2761620.0, ans=0.5 2024-08-14 17:16:58,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=12.0 2024-08-14 17:17:01,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2761820.0, ans=0.0 2024-08-14 17:17:02,878 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 17:17:18,547 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 850, loss[loss=0.1025, beats_loss=0.008035, ecapa_loss=0.0001592, whisper_loss=0.09288, over 18407.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01056, ecapa_loss=0.0001501, whisper_loss=0.08904, over 3779705.99 frames. ], batch size: 73, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:17:19,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-08-14 17:17:27,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2761920.0, ans=0.125 2024-08-14 17:17:38,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2762020.0, ans=0.0 2024-08-14 17:17:48,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-14 17:17:50,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2762120.0, ans=0.125 2024-08-14 17:18:00,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2762120.0, ans=0.1 2024-08-14 17:18:05,435 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-14 17:18:10,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2762220.0, ans=0.2 2024-08-14 17:18:19,699 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 17:18:38,328 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 17:18:43,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 900, loss[loss=0.107, beats_loss=0.01036, ecapa_loss=0.0001565, whisper_loss=0.09511, over 18651.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001504, whisper_loss=0.08904, over 3795151.77 frames. ], batch size: 72, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:19:19,440 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-14 17:19:21,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.291e+01 2.536e+01 2.780e+01 9.206e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-14 17:19:22,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2762620.0, ans=0.125 2024-08-14 17:19:28,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2762620.0, ans=0.0 2024-08-14 17:19:36,478 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 17:19:37,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-14 17:20:06,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 950, loss[loss=0.1119, beats_loss=0.009068, ecapa_loss=0.0001241, whisper_loss=0.1016, over 15894.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.000149, whisper_loss=0.08955, over 3808002.92 frames. ], batch size: 59, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:20:10,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-08-14 17:20:13,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2762920.0, ans=0.125 2024-08-14 17:20:20,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2762920.0, ans=0.1 2024-08-14 17:20:25,646 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 17:20:29,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2763020.0, ans=0.0 2024-08-14 17:20:33,682 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 17:20:41,088 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.241e-01 2024-08-14 17:20:53,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2763120.0, ans=0.2 2024-08-14 17:21:17,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-14 17:21:31,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2763220.0, ans=0.0 2024-08-14 17:21:35,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2763320.0, ans=0.05 2024-08-14 17:21:54,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1000, loss[loss=0.1062, beats_loss=0.01053, ecapa_loss=0.0001426, whisper_loss=0.09423, over 19928.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001486, whisper_loss=0.08948, over 3819937.37 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:22:03,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2763420.0, ans=0.125 2024-08-14 17:22:08,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2763420.0, ans=0.035 2024-08-14 17:22:10,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=12.0 2024-08-14 17:22:33,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.269e+01 2.540e+01 2.773e+01 4.748e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 17:22:36,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2763620.0, ans=0.1 2024-08-14 17:22:40,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2763620.0, ans=15.0 2024-08-14 17:22:46,962 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 17:23:21,343 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.580e+01 2024-08-14 17:23:36,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1050, loss[loss=0.1148, beats_loss=0.009413, ecapa_loss=0.0001743, whisper_loss=0.1036, over 17447.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001487, whisper_loss=0.08982, over 3801319.17 frames. ], batch size: 69, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:23:40,681 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 17:23:42,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2763920.0, ans=0.2 2024-08-14 17:23:46,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-14 17:24:26,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2764120.0, ans=0.125 2024-08-14 17:24:29,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2764120.0, ans=0.125 2024-08-14 17:24:43,025 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 17:24:51,579 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:25:24,036 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-14 17:25:36,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1100, loss[loss=0.1206, beats_loss=0.01053, ecapa_loss=0.0001444, whisper_loss=0.1086, over 23493.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001479, whisper_loss=0.09011, over 3797579.27 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:26:29,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.358e+01 2.558e+01 2.908e+01 1.671e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-14 17:26:45,144 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 17:27:32,208 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 17:27:39,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1150, loss[loss=0.07422, beats_loss=0.01185, ecapa_loss=0.0001624, whisper_loss=0.06075, over 18799.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01069, ecapa_loss=0.000148, whisper_loss=0.08912, over 3836004.40 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:27:42,020 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-14 17:28:39,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2765120.0, ans=0.025 2024-08-14 17:28:42,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2765120.0, ans=0.125 2024-08-14 17:29:02,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2765220.0, ans=0.125 2024-08-14 17:29:24,222 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1200, loss[loss=0.07828, beats_loss=0.01106, ecapa_loss=0.0001657, whisper_loss=0.06557, over 15617.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001489, whisper_loss=0.08912, over 3836918.82 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:29:27,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2765420.0, ans=0.0 2024-08-14 17:29:29,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-14 17:29:34,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2765420.0, ans=0.2 2024-08-14 17:29:50,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-14 17:29:54,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.420e+01 2.672e+01 3.121e+01 5.638e+01, threshold=5.344e+01, percent-clipped=1.0 2024-08-14 17:30:38,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2765920.0, ans=0.05 2024-08-14 17:30:38,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1250, loss[loss=0.0693, beats_loss=0.01325, ecapa_loss=0.0001133, whisper_loss=0.05492, over 17273.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01073, ecapa_loss=0.0001497, whisper_loss=0.0893, over 3827508.45 frames. ], batch size: 70, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:30:42,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2765920.0, ans=0.1 2024-08-14 17:31:39,802 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 17:31:47,093 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 17:31:50,557 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-14 17:31:58,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1300, loss[loss=0.09547, beats_loss=0.01412, ecapa_loss=0.0001321, whisper_loss=0.08002, over 22035.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01069, ecapa_loss=0.0001501, whisper_loss=0.08933, over 3833053.22 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:32:01,726 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-14 17:32:02,516 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 17:32:05,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-14 17:32:12,956 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 17:32:14,209 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 17:32:19,209 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 17:32:25,927 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 17:32:29,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2766620.0, ans=0.125 2024-08-14 17:32:31,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.335e+01 2.518e+01 2.895e+01 4.834e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-14 17:33:10,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2766820.0, ans=0.0 2024-08-14 17:33:16,257 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-14 17:33:16,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2766920.0, ans=0.1 2024-08-14 17:33:17,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1350, loss[loss=0.08699, beats_loss=0.01078, ecapa_loss=0.0001466, whisper_loss=0.07474, over 22378.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001503, whisper_loss=0.08909, over 3835617.78 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:33:20,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2766920.0, ans=0.1 2024-08-14 17:33:38,126 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 17:34:01,854 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 8 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-14 17:34:06,285 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 17:34:13,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2767220.0, ans=0.0 2024-08-14 17:34:21,681 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 17:34:21,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2767220.0, ans=0.125 2024-08-14 17:34:32,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2767320.0, ans=0.2 2024-08-14 17:34:42,721 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1400, loss[loss=0.1129, beats_loss=0.008972, ecapa_loss=0.0001598, whisper_loss=0.1024, over 19369.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001505, whisper_loss=0.08924, over 3844409.13 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:34:43,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2024-08-14 17:34:50,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2767420.0, ans=0.125 2024-08-14 17:35:15,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2767620.0, ans=0.125 2024-08-14 17:35:18,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.291e+01 2.563e+01 2.822e+01 1.881e+02, threshold=5.126e+01, percent-clipped=2.0 2024-08-14 17:35:24,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2767620.0, ans=0.0 2024-08-14 17:35:26,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2767620.0, ans=0.125 2024-08-14 17:35:28,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2024-08-14 17:35:39,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2767720.0, ans=0.0 2024-08-14 17:35:39,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2767720.0, ans=0.0 2024-08-14 17:35:46,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2767720.0, ans=0.0 2024-08-14 17:35:49,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2767820.0, ans=0.125 2024-08-14 17:35:50,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-08-14 17:35:55,413 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 17:36:41,299 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1450, loss[loss=0.09929, beats_loss=0.01021, ecapa_loss=0.0001431, whisper_loss=0.08765, over 17115.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.0896, over 3844316.01 frames. ], batch size: 68, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:36:52,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2767920.0, ans=0.125 2024-08-14 17:36:58,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.46 vs. limit=15.0 2024-08-14 17:37:06,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2768020.0, ans=0.07 2024-08-14 17:37:11,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-14 17:37:12,905 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 17:37:23,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2768120.0, ans=0.1 2024-08-14 17:37:29,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2768220.0, ans=0.125 2024-08-14 17:37:38,251 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 17:37:48,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2768320.0, ans=0.125 2024-08-14 17:38:03,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1500, loss[loss=0.08345, beats_loss=0.01137, ecapa_loss=0.0001382, whisper_loss=0.07069, over 21767.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.0001492, whisper_loss=0.08892, over 3843602.20 frames. ], batch size: 83, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:38:20,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2768520.0, ans=0.95 2024-08-14 17:38:30,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768520.0, ans=0.1 2024-08-14 17:38:36,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2768620.0, ans=0.2 2024-08-14 17:38:37,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.247e+01 2.495e+01 2.740e+01 8.085e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-14 17:38:44,611 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:38:48,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2768620.0, ans=0.0 2024-08-14 17:38:51,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2768720.0, ans=0.125 2024-08-14 17:39:04,911 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:39:08,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-08-14 17:39:09,415 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-14 17:39:21,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2768820.0, ans=0.1 2024-08-14 17:39:23,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768820.0, ans=0.1 2024-08-14 17:39:26,015 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1550, loss[loss=0.1014, beats_loss=0.009666, ecapa_loss=0.0001624, whisper_loss=0.09008, over 21918.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01067, ecapa_loss=0.0001487, whisper_loss=0.08864, over 3869152.03 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:39:48,577 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 17:40:03,745 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 17:40:20,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2769220.0, ans=0.125 2024-08-14 17:40:34,851 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 17:40:42,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2769320.0, ans=0.2 2024-08-14 17:40:45,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1600, loss[loss=0.1265, beats_loss=0.007573, ecapa_loss=0.0001377, whisper_loss=0.1176, over 20242.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001474, whisper_loss=0.08879, over 3865503.36 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:40:46,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2769420.0, ans=0.125 2024-08-14 17:40:56,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=22.5 2024-08-14 17:40:56,774 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-14 17:41:09,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2769520.0, ans=0.1 2024-08-14 17:41:17,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.359e+01 2.603e+01 2.856e+01 4.128e+01, threshold=5.205e+01, percent-clipped=0.0 2024-08-14 17:41:23,447 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 17:41:29,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2769620.0, ans=0.0 2024-08-14 17:41:31,016 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 17:41:36,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2769720.0, ans=0.1 2024-08-14 17:41:43,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2769720.0, ans=0.125 2024-08-14 17:41:45,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-14 17:41:46,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2769820.0, ans=0.125 2024-08-14 17:41:56,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2769820.0, ans=0.125 2024-08-14 17:42:01,588 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1650, loss[loss=0.08842, beats_loss=0.01274, ecapa_loss=0.000147, whisper_loss=0.07421, over 21940.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.0001484, whisper_loss=0.08889, over 3854023.16 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:42:19,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2770020.0, ans=0.0 2024-08-14 17:42:25,054 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 17:42:37,993 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 17:42:38,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2770120.0, ans=0.125 2024-08-14 17:42:43,132 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-14 17:43:12,955 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 17:43:13,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2770320.0, ans=0.125 2024-08-14 17:43:18,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1700, loss[loss=0.09482, beats_loss=0.008517, ecapa_loss=0.0001601, whisper_loss=0.0847, over 14917.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.0001485, whisper_loss=0.08905, over 3851254.99 frames. ], batch size: 59, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:43:23,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=15.0 2024-08-14 17:43:51,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.335e+01 2.576e+01 2.933e+01 1.462e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-14 17:43:51,268 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 17:43:56,813 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 17:44:15,097 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 17:44:33,635 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 17:44:34,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1750, loss[loss=0.1182, beats_loss=0.0114, ecapa_loss=0.0001506, whisper_loss=0.1053, over 20044.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.08921, over 3853525.91 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:44:36,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770920.0, ans=0.1 2024-08-14 17:44:51,514 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 17:44:56,974 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 17:44:59,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-14 17:45:01,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771020.0, ans=0.1 2024-08-14 17:45:08,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-14 17:45:21,001 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-14 17:45:21,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-08-14 17:45:26,466 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 17:45:34,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2771320.0, ans=0.0 2024-08-14 17:45:34,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2771320.0, ans=0.125 2024-08-14 17:45:49,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2771420.0, ans=0.125 2024-08-14 17:45:50,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1800, loss[loss=0.1008, beats_loss=0.01008, ecapa_loss=0.0001128, whisper_loss=0.08955, over 18801.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01066, ecapa_loss=0.0001484, whisper_loss=0.08888, over 3843117.44 frames. ], batch size: 70, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:45:51,929 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 17:45:55,305 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 17:46:02,192 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-14 17:46:07,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-14 17:46:20,869 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 17:46:22,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.297e+01 2.564e+01 2.917e+01 8.345e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-14 17:46:27,188 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 17:46:41,415 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 17:46:44,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2771720.0, ans=0.125 2024-08-14 17:46:45,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2771720.0, ans=10.0 2024-08-14 17:46:47,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2771720.0, ans=0.125 2024-08-14 17:47:06,290 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1850, loss[loss=0.08652, beats_loss=0.01118, ecapa_loss=0.0002007, whisper_loss=0.07333, over 15657.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001492, whisper_loss=0.08861, over 3804210.07 frames. ], batch size: 66, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:47:12,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771920.0, ans=0.1 2024-08-14 17:47:12,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2771920.0, ans=0.125 2024-08-14 17:47:18,421 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-14 17:47:34,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2772020.0, ans=0.125 2024-08-14 17:47:34,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2772020.0, ans=0.0 2024-08-14 17:47:36,442 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 17:47:42,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-14 17:47:45,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-08-14 17:47:48,850 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 22 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 17:48:11,454 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 17:48:13,071 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:48:14,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2772320.0, ans=0.125 2024-08-14 17:48:21,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1900, loss[loss=0.1204, beats_loss=0.008712, ecapa_loss=0.0001486, whisper_loss=0.1102, over 22504.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001488, whisper_loss=0.08848, over 3797378.31 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:48:30,570 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.45 vs. limit=22.5 2024-08-14 17:48:34,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2772420.0, ans=0.1 2024-08-14 17:48:36,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2772520.0, ans=0.2 2024-08-14 17:48:50,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2772520.0, ans=0.125 2024-08-14 17:48:54,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.272e+01 2.538e+01 2.800e+01 8.979e+01, threshold=5.075e+01, percent-clipped=2.0 2024-08-14 17:48:56,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2772620.0, ans=0.125 2024-08-14 17:48:59,697 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 17:49:03,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2772620.0, ans=0.1 2024-08-14 17:49:08,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2772720.0, ans=0.05 2024-08-14 17:49:26,887 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.068e+05 2024-08-14 17:49:37,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 1950, loss[loss=0.1013, beats_loss=0.01114, ecapa_loss=0.0001381, whisper_loss=0.08879, over 20708.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01059, ecapa_loss=0.0001487, whisper_loss=0.08851, over 3768568.87 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:49:49,524 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 17:49:54,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2773020.0, ans=0.0 2024-08-14 17:50:21,106 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 17:50:21,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2773120.0, ans=0.1 2024-08-14 17:50:39,382 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 17:50:50,523 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 17:50:56,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2000, loss[loss=0.1011, beats_loss=0.01242, ecapa_loss=0.0001355, whisper_loss=0.08735, over 22010.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01062, ecapa_loss=0.0001483, whisper_loss=0.08905, over 3782243.64 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:51:03,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2773420.0, ans=0.1 2024-08-14 17:51:20,234 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 41 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 17:51:20,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2773520.0, ans=0.125 2024-08-14 17:51:23,724 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 17:51:29,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.380e+01 2.636e+01 2.886e+01 1.186e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 17:51:39,455 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:51:43,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2773720.0, ans=0.0 2024-08-14 17:52:03,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2773820.0, ans=0.125 2024-08-14 17:52:09,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2773820.0, ans=0.125 2024-08-14 17:52:14,395 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2050, loss[loss=0.1067, beats_loss=0.01135, ecapa_loss=0.0001396, whisper_loss=0.09398, over 15679.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01066, ecapa_loss=0.0001488, whisper_loss=0.08879, over 3801938.01 frames. ], batch size: 62, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:52:24,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2773920.0, ans=0.125 2024-08-14 17:52:30,388 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 17:52:30,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2774020.0, ans=0.125 2024-08-14 17:52:47,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2774120.0, ans=0.125 2024-08-14 17:53:08,637 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 17:53:16,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2774320.0, ans=0.125 2024-08-14 17:53:30,589 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2100, loss[loss=0.09595, beats_loss=0.0105, ecapa_loss=0.0001731, whisper_loss=0.08372, over 21494.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01066, ecapa_loss=0.0001483, whisper_loss=0.08944, over 3838013.46 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:53:48,299 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 17:54:03,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.339e+01 2.575e+01 2.832e+01 4.254e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-14 17:54:44,324 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 15 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 17:54:48,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2150, loss[loss=0.1098, beats_loss=0.008949, ecapa_loss=0.0001558, whisper_loss=0.09928, over 16257.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001486, whisper_loss=0.09006, over 3841888.89 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:54:52,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2774920.0, ans=0.0 2024-08-14 17:55:09,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2775020.0, ans=0.125 2024-08-14 17:55:15,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2775020.0, ans=0.125 2024-08-14 17:55:31,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2024-08-14 17:55:31,862 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 31 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 17:55:46,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2775220.0, ans=0.125 2024-08-14 17:55:51,064 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 17:56:06,066 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2200, loss[loss=0.1, beats_loss=0.009599, ecapa_loss=0.0001629, whisper_loss=0.08882, over 21309.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001486, whisper_loss=0.09026, over 3804483.41 frames. ], batch size: 87, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:56:08,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-08-14 17:56:34,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2775620.0, ans=0.0 2024-08-14 17:56:36,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.388e+01 2.686e+01 3.163e+01 6.240e+01, threshold=5.371e+01, percent-clipped=1.0 2024-08-14 17:56:41,445 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-14 17:56:58,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=8.0 2024-08-14 17:57:13,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-14 17:57:20,458 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 17:57:20,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2775920.0, ans=0.2 2024-08-14 17:57:21,396 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2250, loss[loss=0.1239, beats_loss=0.009345, ecapa_loss=0.000157, whisper_loss=0.113, over 23641.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01083, ecapa_loss=0.0001476, whisper_loss=0.08982, over 3812208.47 frames. ], batch size: 92, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:57:23,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2775920.0, ans=0.125 2024-08-14 17:57:45,303 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-14 17:58:00,246 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 17:58:00,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2776120.0, ans=0.09899494936611666 2024-08-14 17:58:01,583 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 17:58:16,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2776220.0, ans=0.125 2024-08-14 17:58:23,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2776220.0, ans=0.125 2024-08-14 17:58:40,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2300, loss[loss=0.1007, beats_loss=0.007638, ecapa_loss=0.0001632, whisper_loss=0.09146, over 18378.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0109, ecapa_loss=0.0001481, whisper_loss=0.08988, over 3819904.29 frames. ], batch size: 70, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:58:45,836 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:58:57,865 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 23 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-14 17:59:08,271 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 17:59:12,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.401e+01 2.652e+01 3.055e+01 1.168e+02, threshold=5.304e+01, percent-clipped=4.0 2024-08-14 17:59:25,489 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 17:59:28,043 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 17:59:43,659 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 17:59:47,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2776820.0, ans=0.0 2024-08-14 17:59:57,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2350, loss[loss=0.08588, beats_loss=0.01078, ecapa_loss=0.0001534, whisper_loss=0.07357, over 20214.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001496, whisper_loss=0.09046, over 3812087.14 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:00:04,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2776920.0, ans=0.2 2024-08-14 18:00:19,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2024-08-14 18:00:25,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2777020.0, ans=0.0 2024-08-14 18:00:37,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2777120.0, ans=0.125 2024-08-14 18:01:09,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2777320.0, ans=0.125 2024-08-14 18:01:19,125 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2400, loss[loss=0.1265, beats_loss=0.007466, ecapa_loss=0.0001363, whisper_loss=0.1176, over 23019.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.00015, whisper_loss=0.09051, over 3864031.68 frames. ], batch size: 83, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:01:23,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2777420.0, ans=0.125 2024-08-14 18:01:28,451 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 30 from Vox, 18 fro AS 2024-08-14 18:01:32,643 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 18:01:37,726 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 18:01:38,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2024-08-14 18:01:44,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2777520.0, ans=0.125 2024-08-14 18:01:52,704 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.341e+01 2.588e+01 3.015e+01 2.629e+02, threshold=5.175e+01, percent-clipped=2.0 2024-08-14 18:01:58,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2024-08-14 18:02:13,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2777720.0, ans=0.125 2024-08-14 18:02:15,372 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:02:15,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2777720.0, ans=0.125 2024-08-14 18:02:22,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2777720.0, ans=0.125 2024-08-14 18:02:42,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2450, loss[loss=0.1105, beats_loss=0.01066, ecapa_loss=0.0001431, whisper_loss=0.09837, over 23583.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001508, whisper_loss=0.09034, over 3876636.92 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:02:48,136 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 18:03:11,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2778020.0, ans=0.125 2024-08-14 18:03:18,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2778120.0, ans=0.0 2024-08-14 18:03:21,307 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:03:24,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2778120.0, ans=0.125 2024-08-14 18:03:42,705 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 18:03:43,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2778220.0, ans=0.07 2024-08-14 18:03:47,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2778320.0, ans=0.125 2024-08-14 18:04:03,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2500, loss[loss=0.09368, beats_loss=0.01116, ecapa_loss=0.0001483, whisper_loss=0.08104, over 15926.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001514, whisper_loss=0.09083, over 3866083.98 frames. ], batch size: 64, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:04:06,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2778420.0, ans=0.2 2024-08-14 18:04:15,810 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 18:04:25,486 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 18:04:39,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.393e+01 2.682e+01 2.958e+01 4.919e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-14 18:04:48,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2778620.0, ans=0.1 2024-08-14 18:04:48,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2778620.0, ans=0.025 2024-08-14 18:04:52,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2778720.0, ans=0.125 2024-08-14 18:04:54,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2778720.0, ans=0.0 2024-08-14 18:04:56,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2778720.0, ans=0.125 2024-08-14 18:05:04,709 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-14 18:05:24,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-14 18:05:24,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2550, loss[loss=0.08008, beats_loss=0.01323, ecapa_loss=0.0001622, whisper_loss=0.06522, over 20641.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001522, whisper_loss=0.09096, over 3890415.57 frames. ], batch size: 88, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:05:31,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2778920.0, ans=0.0 2024-08-14 18:06:02,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2779120.0, ans=0.0 2024-08-14 18:06:14,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=2779220.0, ans=0.2 2024-08-14 18:06:23,812 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 18:06:24,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-08-14 18:06:44,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2779320.0, ans=15.0 2024-08-14 18:06:46,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2600, loss[loss=0.08334, beats_loss=0.009134, ecapa_loss=0.0001742, whisper_loss=0.07247, over 18141.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.09105, over 3889931.73 frames. ], batch size: 73, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:06:50,000 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:06:50,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2779420.0, ans=0.125 2024-08-14 18:06:55,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=2779420.0, ans=0.1 2024-08-14 18:06:55,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2779420.0, ans=0.125 2024-08-14 18:06:58,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2779420.0, ans=0.0 2024-08-14 18:07:21,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.268e+01 2.541e+01 2.782e+01 4.582e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-14 18:07:27,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2779620.0, ans=0.04949747468305833 2024-08-14 18:07:28,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2779620.0, ans=0.0 2024-08-14 18:07:30,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2779620.0, ans=0.125 2024-08-14 18:07:33,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2779620.0, ans=0.0 2024-08-14 18:07:38,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2779720.0, ans=0.0 2024-08-14 18:07:44,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779720.0, ans=0.1 2024-08-14 18:07:48,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779720.0, ans=0.1 2024-08-14 18:08:00,348 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 18:08:01,797 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 18:08:07,910 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2650, loss[loss=0.08384, beats_loss=0.016, ecapa_loss=0.0001598, whisper_loss=0.06624, over 12784.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.0909, over 3871752.66 frames. ], batch size: 55, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:08:23,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2780020.0, ans=0.1 2024-08-14 18:08:24,709 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 18:09:22,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-08-14 18:09:24,624 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-14 18:09:29,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2700, loss[loss=0.09357, beats_loss=0.01044, ecapa_loss=0.0001301, whisper_loss=0.08183, over 18532.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001526, whisper_loss=0.09133, over 3881858.84 frames. ], batch size: 74, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:09:39,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2780420.0, ans=0.2 2024-08-14 18:09:56,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2780520.0, ans=0.0 2024-08-14 18:10:03,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.336e+01 2.550e+01 2.927e+01 5.134e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 18:10:07,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2780620.0, ans=0.2 2024-08-14 18:10:27,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2780720.0, ans=0.2 2024-08-14 18:10:40,537 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 18:10:49,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2780820.0, ans=0.07 2024-08-14 18:10:52,246 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2750, loss[loss=0.1174, beats_loss=0.01056, ecapa_loss=0.0001465, whisper_loss=0.1054, over 23189.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001523, whisper_loss=0.09121, over 3863174.17 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:11:27,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2781120.0, ans=0.0 2024-08-14 18:11:37,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2781120.0, ans=0.125 2024-08-14 18:11:46,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2781220.0, ans=0.125 2024-08-14 18:11:57,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781220.0, ans=0.1 2024-08-14 18:12:11,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2024-08-14 18:12:16,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2800, loss[loss=0.1286, beats_loss=0.009252, ecapa_loss=0.0001337, whisper_loss=0.118, over 22056.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01061, ecapa_loss=0.0001504, whisper_loss=0.09193, over 3867424.26 frames. ], batch size: 79, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:12:26,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2781420.0, ans=0.125 2024-08-14 18:12:28,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2781420.0, ans=0.0 2024-08-14 18:12:29,558 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 18:12:29,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2781520.0, ans=0.2 2024-08-14 18:12:32,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2024-08-14 18:12:44,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2781520.0, ans=0.125 2024-08-14 18:12:48,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.377e+01 2.677e+01 2.938e+01 4.458e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-14 18:13:21,183 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 18:13:24,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2781820.0, ans=0.02 2024-08-14 18:13:26,601 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 18:13:33,324 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2850, loss[loss=0.08632, beats_loss=0.01043, ecapa_loss=0.0001831, whisper_loss=0.07406, over 20842.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001512, whisper_loss=0.09159, over 3887102.40 frames. ], batch size: 88, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:13:41,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2781920.0, ans=0.09899494936611666 2024-08-14 18:13:46,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2782020.0, ans=0.125 2024-08-14 18:13:53,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2782020.0, ans=0.2 2024-08-14 18:14:08,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.09 vs. limit=22.5 2024-08-14 18:14:15,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2782120.0, ans=0.0 2024-08-14 18:14:16,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2782220.0, ans=0.0 2024-08-14 18:14:17,697 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 18:14:19,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2782220.0, ans=0.125 2024-08-14 18:14:33,441 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 35 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 18:14:41,020 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 18:14:42,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-14 18:14:48,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2900, loss[loss=0.09913, beats_loss=0.01027, ecapa_loss=0.0001726, whisper_loss=0.08713, over 14831.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001515, whisper_loss=0.09148, over 3898643.94 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:15:13,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-14 18:15:18,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.302e+01 2.501e+01 2.806e+01 3.501e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 18:15:21,581 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 18:15:32,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2782720.0, ans=0.2 2024-08-14 18:15:37,635 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 18:16:03,368 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 2950, loss[loss=0.1102, beats_loss=0.01115, ecapa_loss=0.000138, whisper_loss=0.09768, over 20047.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001511, whisper_loss=0.09199, over 3928283.44 frames. ], batch size: 79, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:16:13,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2782920.0, ans=0.125 2024-08-14 18:16:24,332 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 18:16:27,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2783020.0, ans=0.125 2024-08-14 18:16:32,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2783120.0, ans=0.125 2024-08-14 18:16:35,992 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 18:16:37,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2024-08-14 18:16:39,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2783120.0, ans=0.05 2024-08-14 18:17:18,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3000, loss[loss=0.07692, beats_loss=0.01156, ecapa_loss=0.00015, whisper_loss=0.06386, over 19743.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01068, ecapa_loss=0.0001515, whisper_loss=0.09235, over 3968090.98 frames. ], batch size: 81, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:17:18,106 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 18:17:58,786 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on ASR_libri: loss=0.2511, beats_loss=0, ecapa_loss=0.0005401, whisper_loss=0.2457, over 922467.00 frames. 2024-08-14 18:18:19,618 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on SV_voxceleb1: loss=0.004329, beats_loss=0, ecapa_loss=0.0004329, whisper_loss=0, over 939242.00 frames. 2024-08-14 18:20:16,633 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 18:20:16,637 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 18:20:47,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2783620.0, ans=0.2 2024-08-14 18:20:48,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.403e+01 2.631e+01 2.938e+01 2.975e+02, threshold=5.261e+01, percent-clipped=1.0 2024-08-14 18:20:50,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=12.0 2024-08-14 18:21:22,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2783820.0, ans=0.125 2024-08-14 18:21:29,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.57 vs. limit=6.0 2024-08-14 18:21:30,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3050, loss[loss=0.1045, beats_loss=0.009766, ecapa_loss=0.000162, whisper_loss=0.09314, over 20664.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01065, ecapa_loss=0.0001527, whisper_loss=0.09251, over 3964973.79 frames. ], batch size: 80, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:21:52,600 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 18:21:54,084 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 18:22:00,214 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 18:22:06,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2784120.0, ans=0.125 2024-08-14 18:22:06,714 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-14 18:22:11,840 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 18:22:19,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784220.0, ans=0.1 2024-08-14 18:22:21,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2784220.0, ans=0.0 2024-08-14 18:22:31,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-14 18:22:35,492 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 18:22:35,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2784320.0, ans=0.2 2024-08-14 18:22:45,162 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 18:22:46,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3100, loss[loss=0.1122, beats_loss=0.01216, ecapa_loss=0.0001561, whisper_loss=0.09852, over 15071.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01065, ecapa_loss=0.0001543, whisper_loss=0.09298, over 3938771.96 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:22:55,234 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 18:23:04,846 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 18:23:13,522 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 18:23:16,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.365e+01 2.545e+01 2.848e+01 4.706e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-14 18:23:20,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784620.0, ans=0.1 2024-08-14 18:23:35,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=2784720.0, ans=0.02 2024-08-14 18:23:39,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-14 18:23:40,548 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 18:23:46,000 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 18:23:56,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3150, loss[loss=0.1268, beats_loss=0.009278, ecapa_loss=0.0001739, whisper_loss=0.1158, over 14782.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001546, whisper_loss=0.09221, over 3938012.70 frames. ], batch size: 57, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:23:59,893 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 18:24:04,004 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 18:24:10,758 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 18:24:23,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 18:24:44,958 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 18 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 18:25:01,944 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 18:25:06,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3200, loss[loss=0.09415, beats_loss=0.01375, ecapa_loss=0.0001454, whisper_loss=0.07895, over 22241.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01069, ecapa_loss=0.0001542, whisper_loss=0.09245, over 3881902.49 frames. ], batch size: 91, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:25:15,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2024-08-14 18:25:16,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2785420.0, ans=0.125 2024-08-14 18:25:17,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2785420.0, ans=0.125 2024-08-14 18:25:35,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.274e+01 2.528e+01 2.834e+01 7.598e+01, threshold=5.056e+01, percent-clipped=2.0 2024-08-14 18:25:35,539 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 18:25:38,185 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 18:25:38,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2785620.0, ans=0.1 2024-08-14 18:26:15,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3250, loss[loss=0.09863, beats_loss=0.01134, ecapa_loss=0.0001393, whisper_loss=0.08589, over 22311.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001548, whisper_loss=0.09186, over 3883421.06 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:26:19,291 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 18:26:21,840 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 18:26:30,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2786020.0, ans=0.1 2024-08-14 18:27:16,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2786320.0, ans=0.125 2024-08-14 18:27:22,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3300, loss[loss=0.09811, beats_loss=0.01189, ecapa_loss=0.0001399, whisper_loss=0.08482, over 14335.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001549, whisper_loss=0.09164, over 3867707.92 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:27:32,914 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 18:27:46,156 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-14 18:27:51,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.316e+01 2.463e+01 2.771e+01 4.814e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-14 18:27:52,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-14 18:28:00,918 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 18:28:06,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2786720.0, ans=0.125 2024-08-14 18:28:07,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2786720.0, ans=0.5 2024-08-14 18:28:11,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2786720.0, ans=0.125 2024-08-14 18:28:20,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2786820.0, ans=0.0 2024-08-14 18:28:21,100 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 15 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-14 18:28:25,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 18:28:30,059 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3350, loss[loss=0.1231, beats_loss=0.009521, ecapa_loss=0.0001335, whisper_loss=0.1122, over 17218.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001542, whisper_loss=0.09153, over 3876855.79 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:28:30,300 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 8 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 18:28:35,003 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 29 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 18:28:38,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2786920.0, ans=0.125 2024-08-14 18:28:50,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2787020.0, ans=0.125 2024-08-14 18:29:02,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2787120.0, ans=0.125 2024-08-14 18:29:05,023 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 18:29:09,321 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 18:29:10,636 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 18:29:15,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787220.0, ans=0.1 2024-08-14 18:29:20,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-08-14 18:29:21,987 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 18:29:27,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2787320.0, ans=0.0 2024-08-14 18:29:39,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3400, loss[loss=0.0983, beats_loss=0.009926, ecapa_loss=0.0001807, whisper_loss=0.08657, over 21845.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001529, whisper_loss=0.09087, over 3876696.55 frames. ], batch size: 92, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:29:47,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-08-14 18:29:54,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2024-08-14 18:29:59,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2787520.0, ans=0.125 2024-08-14 18:30:04,666 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:30:07,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.360e+01 2.659e+01 3.040e+01 2.409e+02, threshold=5.318e+01, percent-clipped=1.0 2024-08-14 18:30:14,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.52 vs. limit=6.0 2024-08-14 18:30:30,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2787720.0, ans=0.125 2024-08-14 18:30:40,927 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.62 vs. limit=22.5 2024-08-14 18:30:41,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=12.0 2024-08-14 18:30:47,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2787920.0, ans=0.0 2024-08-14 18:30:48,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3450, loss[loss=0.1133, beats_loss=0.009126, ecapa_loss=0.0001805, whisper_loss=0.1024, over 20031.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.000154, whisper_loss=0.09029, over 3859437.63 frames. ], batch size: 79, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:31:00,331 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 18:31:10,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2788020.0, ans=0.1 2024-08-14 18:31:14,603 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 18:31:14,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2788120.0, ans=0.125 2024-08-14 18:31:18,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-14 18:31:25,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2788120.0, ans=0.1 2024-08-14 18:31:39,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2788220.0, ans=0.0 2024-08-14 18:31:54,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-08-14 18:31:55,066 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3500, loss[loss=0.1055, beats_loss=0.009737, ecapa_loss=0.0001802, whisper_loss=0.094, over 14345.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001537, whisper_loss=0.09047, over 3851211.35 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:32:06,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2788420.0, ans=0.035 2024-08-14 18:32:19,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2788520.0, ans=0.0 2024-08-14 18:32:22,193 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 18:32:22,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2788620.0, ans=0.125 2024-08-14 18:32:23,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.371e+01 2.585e+01 2.886e+01 6.376e+01, threshold=5.170e+01, percent-clipped=1.0 2024-08-14 18:32:33,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2788620.0, ans=0.0 2024-08-14 18:32:45,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-14 18:32:53,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2788820.0, ans=0.0 2024-08-14 18:33:03,408 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3550, loss[loss=0.1232, beats_loss=0.009993, ecapa_loss=0.0001561, whisper_loss=0.1116, over 23614.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01079, ecapa_loss=0.0001541, whisper_loss=0.09003, over 3874767.31 frames. ], batch size: 91, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:33:04,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2788920.0, ans=0.125 2024-08-14 18:33:18,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2789020.0, ans=0.09899494936611666 2024-08-14 18:33:54,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2789220.0, ans=0.125 2024-08-14 18:34:08,906 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 18:34:10,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2789420.0, ans=0.05 2024-08-14 18:34:11,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3600, loss[loss=0.08993, beats_loss=0.01144, ecapa_loss=0.0001661, whisper_loss=0.07683, over 16102.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001542, whisper_loss=0.09104, over 3872789.18 frames. ], batch size: 66, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:34:13,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2789420.0, ans=0.2 2024-08-14 18:34:39,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.323e+01 2.540e+01 2.892e+01 4.287e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-14 18:34:40,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2789620.0, ans=0.125 2024-08-14 18:34:46,892 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 18:35:10,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2024-08-14 18:35:14,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2789820.0, ans=0.125 2024-08-14 18:35:15,593 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 18:35:19,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3650, loss[loss=0.07951, beats_loss=0.01118, ecapa_loss=0.0001631, whisper_loss=0.06669, over 13923.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001549, whisper_loss=0.09058, over 3846687.65 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:35:21,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2789920.0, ans=0.125 2024-08-14 18:35:23,282 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 18:35:23,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2789920.0, ans=0.035 2024-08-14 18:35:25,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-14 18:35:26,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2789920.0, ans=0.0 2024-08-14 18:35:38,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2790020.0, ans=0.1 2024-08-14 18:35:56,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2790120.0, ans=0.0 2024-08-14 18:36:02,576 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 18:36:05,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=12.0 2024-08-14 18:36:26,215 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3700, loss[loss=0.1198, beats_loss=0.007484, ecapa_loss=0.000165, whisper_loss=0.1106, over 16089.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001547, whisper_loss=0.09051, over 3804843.68 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:36:33,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-14 18:36:43,697 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-14 18:36:53,398 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-14 18:36:54,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.282e+01 2.542e+01 2.895e+01 4.405e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 18:36:56,543 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 18:37:09,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2790720.0, ans=0.035 2024-08-14 18:37:33,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3750, loss[loss=0.09189, beats_loss=0.01156, ecapa_loss=0.0001474, whisper_loss=0.07886, over 20661.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.000155, whisper_loss=0.09015, over 3824835.81 frames. ], batch size: 85, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:37:44,810 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-14 18:37:47,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2791020.0, ans=0.2 2024-08-14 18:37:52,734 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 18:38:10,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2791120.0, ans=0.125 2024-08-14 18:38:12,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2024-08-14 18:38:16,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2791220.0, ans=0.0 2024-08-14 18:38:21,535 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 18:38:24,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2791220.0, ans=0.2 2024-08-14 18:38:28,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2791320.0, ans=0.1 2024-08-14 18:38:36,179 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 18:38:41,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3800, loss[loss=0.1007, beats_loss=0.01026, ecapa_loss=0.0001388, whisper_loss=0.08905, over 15909.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001543, whisper_loss=0.09, over 3823159.86 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:39:00,174 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 18:39:09,493 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.383e+01 2.672e+01 2.913e+01 4.805e+01, threshold=5.345e+01, percent-clipped=0.0 2024-08-14 18:39:18,290 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.137e+00 2024-08-14 18:39:22,204 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 18:39:48,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3850, loss[loss=0.1377, beats_loss=0.008087, ecapa_loss=0.0001832, whisper_loss=0.1278, over 21962.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001547, whisper_loss=0.09032, over 3815431.06 frames. ], batch size: 88, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:39:51,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2791920.0, ans=0.125 2024-08-14 18:39:52,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2791920.0, ans=0.0 2024-08-14 18:39:54,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2024-08-14 18:40:02,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2792020.0, ans=0.125 2024-08-14 18:40:05,477 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 18:40:08,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2792020.0, ans=0.125 2024-08-14 18:40:34,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2792220.0, ans=0.0 2024-08-14 18:40:55,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2792420.0, ans=0.0 2024-08-14 18:40:55,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3900, loss[loss=0.09629, beats_loss=0.008747, ecapa_loss=0.0001964, whisper_loss=0.08558, over 16441.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.000155, whisper_loss=0.09012, over 3840821.63 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:40:57,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2792420.0, ans=0.125 2024-08-14 18:41:07,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2792420.0, ans=0.125 2024-08-14 18:41:24,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.416e+01 2.719e+01 3.088e+01 3.540e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-14 18:41:25,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-08-14 18:41:36,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2792720.0, ans=0.125 2024-08-14 18:41:39,833 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 18:41:47,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2792720.0, ans=0.0 2024-08-14 18:42:03,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 3950, loss[loss=0.1082, beats_loss=0.009843, ecapa_loss=0.0001678, whisper_loss=0.09671, over 22685.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001555, whisper_loss=0.09025, over 3857490.78 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:42:04,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2024-08-14 18:42:14,591 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 18:42:26,680 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 18:42:41,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2793120.0, ans=0.125 2024-08-14 18:42:52,512 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 18:42:56,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2793320.0, ans=0.1 2024-08-14 18:43:10,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4000, loss[loss=0.1157, beats_loss=0.01003, ecapa_loss=0.0001452, whisper_loss=0.1043, over 18554.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001548, whisper_loss=0.09032, over 3855041.23 frames. ], batch size: 71, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:43:11,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2793420.0, ans=0.125 2024-08-14 18:43:15,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2793420.0, ans=0.125 2024-08-14 18:43:17,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2793420.0, ans=0.125 2024-08-14 18:43:17,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-14 18:43:24,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2793520.0, ans=0.0 2024-08-14 18:43:33,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2793520.0, ans=0.125 2024-08-14 18:43:39,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.386e+01 2.659e+01 3.102e+01 4.594e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 18:43:39,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2793620.0, ans=0.1 2024-08-14 18:43:48,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2793620.0, ans=0.0 2024-08-14 18:44:16,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=12.0 2024-08-14 18:44:19,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4050, loss[loss=0.1233, beats_loss=0.006334, ecapa_loss=0.0001763, whisper_loss=0.1153, over 22229.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001547, whisper_loss=0.09062, over 3858425.27 frames. ], batch size: 88, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:44:26,473 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 18:44:35,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2794020.0, ans=0.125 2024-08-14 18:44:47,023 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 18:44:52,411 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 18:44:52,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2794120.0, ans=0.0 2024-08-14 18:44:56,534 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 18:44:57,757 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 18:45:12,941 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 18:45:22,613 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-14 18:45:22,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794320.0, ans=0.1 2024-08-14 18:45:27,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4100, loss[loss=0.105, beats_loss=0.009921, ecapa_loss=0.0001448, whisper_loss=0.0936, over 20777.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001538, whisper_loss=0.09105, over 3883047.49 frames. ], batch size: 83, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:45:38,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2794420.0, ans=0.0 2024-08-14 18:45:46,261 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 18:45:48,878 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 18:45:57,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.354e+01 2.603e+01 2.918e+01 6.130e+01, threshold=5.207e+01, percent-clipped=1.0 2024-08-14 18:46:07,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2794620.0, ans=0.125 2024-08-14 18:46:14,836 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 30 from LS+wenet, 10 from Vox, 15 fro AS 2024-08-14 18:46:16,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2024-08-14 18:46:23,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2794820.0, ans=0.125 2024-08-14 18:46:31,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2794820.0, ans=0.0 2024-08-14 18:46:35,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2794920.0, ans=0.0 2024-08-14 18:46:36,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4150, loss[loss=0.1062, beats_loss=0.0117, ecapa_loss=0.0001355, whisper_loss=0.09315, over 22183.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001538, whisper_loss=0.09139, over 3880102.89 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:46:36,355 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 18:46:52,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2795020.0, ans=0.125 2024-08-14 18:47:07,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2024-08-14 18:47:13,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=15.0 2024-08-14 18:47:16,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2795220.0, ans=0.1 2024-08-14 18:47:19,845 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 18:47:44,055 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4200, loss[loss=0.08381, beats_loss=0.009765, ecapa_loss=0.0001701, whisper_loss=0.07235, over 14415.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001556, whisper_loss=0.09131, over 3886297.82 frames. ], batch size: 58, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:47:45,552 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 18:47:55,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2795420.0, ans=0.0 2024-08-14 18:48:04,918 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.035e-03 2024-08-14 18:48:12,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.416e+01 2.672e+01 2.930e+01 6.892e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 18:48:21,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2795620.0, ans=15.0 2024-08-14 18:48:21,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.17 vs. limit=15.0 2024-08-14 18:48:28,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2795720.0, ans=0.125 2024-08-14 18:48:28,511 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2024-08-14 18:48:34,584 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 18:48:36,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2795720.0, ans=0.0 2024-08-14 18:48:45,500 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 18:48:52,310 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4250, loss[loss=0.09785, beats_loss=0.01009, ecapa_loss=0.0001424, whisper_loss=0.08633, over 18082.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001545, whisper_loss=0.09156, over 3895196.78 frames. ], batch size: 71, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:49:07,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=15.0 2024-08-14 18:49:12,469 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 18:49:17,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2796020.0, ans=0.5 2024-08-14 18:49:18,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2796020.0, ans=0.125 2024-08-14 18:49:24,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2796120.0, ans=0.1 2024-08-14 18:49:27,169 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.142e-03 2024-08-14 18:49:42,011 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 18:49:47,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2796220.0, ans=0.0 2024-08-14 18:49:57,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2796320.0, ans=0.0 2024-08-14 18:49:59,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2796320.0, ans=0.1 2024-08-14 18:50:00,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2796320.0, ans=0.0 2024-08-14 18:50:02,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4300, loss[loss=0.1176, beats_loss=0.01103, ecapa_loss=0.0001451, whisper_loss=0.1051, over 18663.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001545, whisper_loss=0.09096, over 3879551.46 frames. ], batch size: 72, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:50:10,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-08-14 18:50:15,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2796420.0, ans=0.0 2024-08-14 18:50:28,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2024-08-14 18:50:34,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.393e+01 2.675e+01 3.079e+01 4.317e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-14 18:50:45,132 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 18:50:48,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2796720.0, ans=0.2 2024-08-14 18:50:50,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2796720.0, ans=0.125 2024-08-14 18:50:50,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2796720.0, ans=0.0 2024-08-14 18:50:53,727 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.493e+05 2024-08-14 18:50:59,531 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 18:51:18,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4350, loss[loss=0.1167, beats_loss=0.0096, ecapa_loss=0.0001633, whisper_loss=0.1054, over 16667.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001558, whisper_loss=0.09114, over 3869360.94 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:51:21,545 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-14 18:51:32,093 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 18:51:43,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-14 18:52:00,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2797120.0, ans=0.0 2024-08-14 18:52:19,930 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 18:52:20,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2797320.0, ans=0.2 2024-08-14 18:52:31,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2797420.0, ans=0.125 2024-08-14 18:52:31,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2797420.0, ans=0.125 2024-08-14 18:52:32,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4400, loss[loss=0.09259, beats_loss=0.0109, ecapa_loss=0.0001244, whisper_loss=0.08045, over 18375.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001565, whisper_loss=0.0911, over 3850041.68 frames. ], batch size: 71, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:52:46,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2797420.0, ans=0.125 2024-08-14 18:53:04,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.379e+01 2.659e+01 2.952e+01 7.187e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 18:53:15,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2797620.0, ans=0.125 2024-08-14 18:53:22,979 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 18:53:48,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4450, loss[loss=0.08877, beats_loss=0.01107, ecapa_loss=0.0001712, whisper_loss=0.07599, over 21702.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001558, whisper_loss=0.09093, over 3884035.81 frames. ], batch size: 91, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:53:49,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2797920.0, ans=0.0 2024-08-14 18:53:58,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2797920.0, ans=0.0 2024-08-14 18:54:01,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2797920.0, ans=0.0 2024-08-14 18:54:16,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2798020.0, ans=0.07 2024-08-14 18:54:17,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2798020.0, ans=0.125 2024-08-14 18:54:25,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2798120.0, ans=0.1 2024-08-14 18:54:29,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2798120.0, ans=0.1 2024-08-14 18:54:59,135 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 18:55:06,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4500, loss[loss=0.07821, beats_loss=0.01332, ecapa_loss=0.0001711, whisper_loss=0.06318, over 15984.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001553, whisper_loss=0.09046, over 3911674.23 frames. ], batch size: 70, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:55:18,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2798420.0, ans=0.0 2024-08-14 18:55:27,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2798520.0, ans=0.125 2024-08-14 18:55:42,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.288e+01 2.644e+01 2.918e+01 3.847e+02, threshold=5.287e+01, percent-clipped=3.0 2024-08-14 18:55:51,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2798620.0, ans=0.1 2024-08-14 18:56:04,521 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 18:56:15,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2798820.0, ans=0.2 2024-08-14 18:56:26,122 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4550, loss[loss=0.09986, beats_loss=0.01405, ecapa_loss=0.0001147, whisper_loss=0.08467, over 23614.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001555, whisper_loss=0.09077, over 3911374.81 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:56:35,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2798920.0, ans=0.0 2024-08-14 18:56:36,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2798920.0, ans=0.0 2024-08-14 18:56:36,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2798920.0, ans=0.125 2024-08-14 18:56:42,613 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 18 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 18:56:55,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-08-14 18:56:57,970 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 18:57:23,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-14 18:57:39,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2799320.0, ans=0.125 2024-08-14 18:57:39,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2799320.0, ans=0.2 2024-08-14 18:57:43,384 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4600, loss[loss=0.1108, beats_loss=0.01005, ecapa_loss=0.0001503, whisper_loss=0.09923, over 18561.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001555, whisper_loss=0.09119, over 3919892.74 frames. ], batch size: 73, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:57:44,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2799420.0, ans=0.125 2024-08-14 18:57:45,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2799420.0, ans=0.2 2024-08-14 18:57:47,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2799420.0, ans=0.125 2024-08-14 18:57:48,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-14 18:58:04,305 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 18:58:10,292 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 18:58:15,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.391e+01 2.667e+01 2.855e+01 4.020e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-14 18:58:32,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799720.0, ans=0.1 2024-08-14 18:58:38,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-08-14 18:58:58,043 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4650, loss[loss=0.0868, beats_loss=0.0115, ecapa_loss=0.0001284, whisper_loss=0.07402, over 16609.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001558, whisper_loss=0.09052, over 3894470.28 frames. ], batch size: 66, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:59:14,003 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 18:59:17,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2800020.0, ans=0.125 2024-08-14 18:59:22,922 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 18:59:23,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-14 18:59:54,139 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 19:00:07,030 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.531e-01 2024-08-14 19:00:08,484 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 19:00:17,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4700, loss[loss=0.1007, beats_loss=0.01353, ecapa_loss=0.0001143, whisper_loss=0.08603, over 20472.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001548, whisper_loss=0.09048, over 3907487.57 frames. ], batch size: 80, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:00:26,458 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 19:00:28,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2800420.0, ans=0.0 2024-08-14 19:00:40,206 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 19:00:46,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2800620.0, ans=0.125 2024-08-14 19:00:49,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800620.0, ans=0.1 2024-08-14 19:00:50,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2024-08-14 19:00:50,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.338e+01 2.588e+01 2.905e+01 3.899e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 19:01:05,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800720.0, ans=0.1 2024-08-14 19:01:09,240 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 19:01:30,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2800820.0, ans=0.125 2024-08-14 19:01:33,194 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4750, loss[loss=0.1132, beats_loss=0.008788, ecapa_loss=0.0001815, whisper_loss=0.1026, over 20933.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001543, whisper_loss=0.09077, over 3898287.01 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:02:09,588 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-14 19:02:23,021 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 19:02:40,448 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 19:02:43,392 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 19:02:45,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2801320.0, ans=0.0 2024-08-14 19:02:47,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4800, loss[loss=0.108, beats_loss=0.007534, ecapa_loss=0.0001941, whisper_loss=0.09856, over 16359.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.08994, over 3859073.47 frames. ], batch size: 64, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:03:14,853 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 33 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-14 19:03:16,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2801620.0, ans=0.2 2024-08-14 19:03:18,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2801620.0, ans=0.125 2024-08-14 19:03:20,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.344e+01 2.546e+01 2.876e+01 4.578e+02, threshold=5.092e+01, percent-clipped=1.0 2024-08-14 19:03:57,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2801820.0, ans=0.0 2024-08-14 19:04:01,154 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4850, loss[loss=0.0993, beats_loss=0.01052, ecapa_loss=0.0001425, whisper_loss=0.08735, over 21221.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001536, whisper_loss=0.09012, over 3907569.82 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:04:09,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801920.0, ans=0.1 2024-08-14 19:04:10,811 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.055e+00 2024-08-14 19:04:21,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2802020.0, ans=0.0 2024-08-14 19:04:25,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2802020.0, ans=10.0 2024-08-14 19:04:48,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2802220.0, ans=0.0 2024-08-14 19:04:48,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2802220.0, ans=0.09899494936611666 2024-08-14 19:05:05,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2802320.0, ans=0.0 2024-08-14 19:05:17,514 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4900, loss[loss=0.114, beats_loss=0.0119, ecapa_loss=0.0001412, whisper_loss=0.1007, over 17090.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001539, whisper_loss=0.09003, over 3863335.04 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:05:25,525 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 19:05:40,587 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 19:05:42,152 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 19:05:44,859 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 19:05:46,409 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 19:05:50,669 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 19:05:52,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.355e+01 2.636e+01 2.883e+01 6.029e+01, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 19:05:57,279 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 19:06:01,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2802620.0, ans=0.125 2024-08-14 19:06:07,652 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 19:06:26,016 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 19:06:38,438 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 4950, loss[loss=0.1179, beats_loss=0.01059, ecapa_loss=0.0001556, whisper_loss=0.1057, over 23783.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001534, whisper_loss=0.09047, over 3878790.12 frames. ], batch size: 92, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:06:39,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2802920.0, ans=0.125 2024-08-14 19:06:54,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2803020.0, ans=0.0 2024-08-14 19:07:01,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2803020.0, ans=0.125 2024-08-14 19:07:09,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2024-08-14 19:07:42,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2803320.0, ans=0.125 2024-08-14 19:07:43,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2803320.0, ans=0.09899494936611666 2024-08-14 19:07:47,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2803320.0, ans=0.0 2024-08-14 19:07:54,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5000, loss[loss=0.09238, beats_loss=0.01163, ecapa_loss=0.0001564, whisper_loss=0.07918, over 16031.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.000153, whisper_loss=0.09041, over 3837802.05 frames. ], batch size: 65, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:08:10,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2803520.0, ans=0.0 2024-08-14 19:08:21,827 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 19:08:25,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.350e+01 2.620e+01 2.995e+01 1.741e+02, threshold=5.241e+01, percent-clipped=2.0 2024-08-14 19:08:35,529 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 19:08:36,647 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 19:08:43,563 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 19:08:50,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2803720.0, ans=0.04949747468305833 2024-08-14 19:08:51,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2024-08-14 19:08:53,603 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-14 19:09:06,232 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5050, loss[loss=0.09773, beats_loss=0.01157, ecapa_loss=0.0001209, whisper_loss=0.08495, over 16191.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001538, whisper_loss=0.09007, over 3872188.76 frames. ], batch size: 62, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:09:11,072 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 19:09:26,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=12.0 2024-08-14 19:09:39,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-08-14 19:09:56,082 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 19:09:59,100 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 19:10:01,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2804220.0, ans=0.2 2024-08-14 19:10:11,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2804320.0, ans=0.0 2024-08-14 19:10:21,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5100, loss[loss=0.1161, beats_loss=0.01016, ecapa_loss=0.0001642, whisper_loss=0.1043, over 19204.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0108, ecapa_loss=0.0001534, whisper_loss=0.09036, over 3876088.11 frames. ], batch size: 77, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:10:45,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2804520.0, ans=0.1 2024-08-14 19:10:56,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2804620.0, ans=0.125 2024-08-14 19:10:56,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.368e+01 2.597e+01 2.934e+01 4.134e+01, threshold=5.194e+01, percent-clipped=0.0 2024-08-14 19:11:01,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2804620.0, ans=0.0 2024-08-14 19:11:04,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2804620.0, ans=0.1 2024-08-14 19:11:21,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2804720.0, ans=0.0 2024-08-14 19:11:22,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2804720.0, ans=0.0 2024-08-14 19:11:39,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2804920.0, ans=0.125 2024-08-14 19:11:40,538 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5150, loss[loss=0.08804, beats_loss=0.009728, ecapa_loss=0.0001512, whisper_loss=0.0768, over 19023.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001534, whisper_loss=0.09107, over 3900420.60 frames. ], batch size: 76, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:11:45,251 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 19:11:52,956 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 19:12:26,556 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 19:12:34,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2805220.0, ans=0.125 2024-08-14 19:12:34,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2805220.0, ans=0.125 2024-08-14 19:12:37,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2805220.0, ans=0.1 2024-08-14 19:12:45,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2805320.0, ans=0.0 2024-08-14 19:12:46,314 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 19:12:54,916 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5200, loss[loss=0.0981, beats_loss=0.01266, ecapa_loss=0.0001494, whisper_loss=0.08394, over 21110.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001531, whisper_loss=0.09042, over 3888043.04 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:12:55,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2805420.0, ans=0.0 2024-08-14 19:13:07,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2805420.0, ans=0.0 2024-08-14 19:13:25,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2805620.0, ans=0.125 2024-08-14 19:13:28,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.358e+01 2.582e+01 2.808e+01 4.877e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 19:13:33,217 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 19:13:37,628 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 19:13:38,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-08-14 19:13:48,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2805720.0, ans=0.1 2024-08-14 19:13:49,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2805720.0, ans=0.125 2024-08-14 19:14:01,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-08-14 19:14:10,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5250, loss[loss=0.107, beats_loss=0.0104, ecapa_loss=0.0001419, whisper_loss=0.09521, over 22108.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001545, whisper_loss=0.09061, over 3889889.54 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:14:27,281 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 36 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 19:14:29,253 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.563e+01 2024-08-14 19:15:07,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806220.0, ans=0.1 2024-08-14 19:15:07,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2806220.0, ans=0.125 2024-08-14 19:15:14,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2024-08-14 19:15:17,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2806320.0, ans=0.0 2024-08-14 19:15:19,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-14 19:15:27,943 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5300, loss[loss=0.1111, beats_loss=0.01043, ecapa_loss=0.0001228, whisper_loss=0.0994, over 19284.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001553, whisper_loss=0.09113, over 3887047.26 frames. ], batch size: 75, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:15:29,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-08-14 19:15:30,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2806420.0, ans=0.125 2024-08-14 19:15:37,271 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 19:16:00,923 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 19:16:01,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2806620.0, ans=0.125 2024-08-14 19:16:02,017 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.456e+01 2.845e+01 4.034e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-14 19:16:05,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2806620.0, ans=0.2 2024-08-14 19:16:39,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2806820.0, ans=0.1 2024-08-14 19:16:40,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2806820.0, ans=0.125 2024-08-14 19:16:43,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2806820.0, ans=0.2 2024-08-14 19:16:45,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5350, loss[loss=0.09629, beats_loss=0.01114, ecapa_loss=0.0001635, whisper_loss=0.08352, over 21026.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001548, whisper_loss=0.09033, over 3875111.49 frames. ], batch size: 85, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:16:46,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-14 19:17:47,892 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 19:17:49,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2807220.0, ans=0.2 2024-08-14 19:18:01,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2807320.0, ans=0.0 2024-08-14 19:18:01,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2807320.0, ans=0.2 2024-08-14 19:18:13,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5400, loss[loss=0.1028, beats_loss=0.009635, ecapa_loss=0.0001389, whisper_loss=0.09178, over 16632.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001536, whisper_loss=0.09005, over 3867263.00 frames. ], batch size: 63, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:18:23,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2807420.0, ans=0.0 2024-08-14 19:18:23,503 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:18:34,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2807520.0, ans=0.125 2024-08-14 19:18:35,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2807520.0, ans=0.07 2024-08-14 19:18:45,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2807620.0, ans=0.0 2024-08-14 19:18:47,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=12.0 2024-08-14 19:18:50,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.371e+01 2.761e+01 3.113e+01 5.866e+01, threshold=5.523e+01, percent-clipped=1.0 2024-08-14 19:18:50,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.38 vs. limit=22.5 2024-08-14 19:18:53,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2807620.0, ans=0.0 2024-08-14 19:19:09,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-14 19:19:12,262 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 19:19:19,352 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 19:19:33,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2807820.0, ans=0.125 2024-08-14 19:19:43,229 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5450, loss[loss=0.1041, beats_loss=0.01132, ecapa_loss=0.0001644, whisper_loss=0.09117, over 22076.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.000153, whisper_loss=0.09065, over 3896939.47 frames. ], batch size: 92, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:20:00,877 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 19:20:10,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2808020.0, ans=0.2 2024-08-14 19:20:14,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-14 19:20:32,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-14 19:20:38,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808120.0, ans=0.1 2024-08-14 19:20:41,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2024-08-14 19:20:42,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2808220.0, ans=0.125 2024-08-14 19:20:45,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2808220.0, ans=0.125 2024-08-14 19:20:52,874 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 19:20:55,133 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 19:21:23,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5500, loss[loss=0.1232, beats_loss=0.009474, ecapa_loss=0.0001497, whisper_loss=0.1123, over 19555.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.000153, whisper_loss=0.09065, over 3903282.89 frames. ], batch size: 80, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:21:33,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2808420.0, ans=0.1 2024-08-14 19:21:42,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2808520.0, ans=0.1 2024-08-14 19:21:44,642 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 19:21:49,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2808520.0, ans=0.125 2024-08-14 19:21:53,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2808520.0, ans=0.125 2024-08-14 19:22:09,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.424e+01 2.779e+01 3.099e+01 3.330e+02, threshold=5.557e+01, percent-clipped=2.0 2024-08-14 19:22:12,289 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 19:22:13,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2808620.0, ans=0.0 2024-08-14 19:22:13,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808620.0, ans=0.1 2024-08-14 19:22:48,660 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 26 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-14 19:22:52,360 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 37 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 19:23:05,715 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 19:23:11,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5550, loss[loss=0.09798, beats_loss=0.01072, ecapa_loss=0.000216, whisper_loss=0.0851, over 19898.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001541, whisper_loss=0.0907, over 3898835.04 frames. ], batch size: 89, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:23:24,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2808920.0, ans=0.125 2024-08-14 19:23:42,789 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 19:24:03,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2809120.0, ans=0.125 2024-08-14 19:24:40,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2809320.0, ans=0.125 2024-08-14 19:24:46,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2809320.0, ans=0.1 2024-08-14 19:24:51,716 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 19:24:52,696 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5600, loss[loss=0.1018, beats_loss=0.01006, ecapa_loss=0.0001648, whisper_loss=0.09004, over 19922.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001546, whisper_loss=0.09084, over 3919923.41 frames. ], batch size: 82, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:25:00,211 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 19:25:19,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2809520.0, ans=0.0 2024-08-14 19:25:24,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.292e+01 2.694e+01 2.993e+01 3.874e+01, threshold=5.387e+01, percent-clipped=0.0 2024-08-14 19:25:26,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2809620.0, ans=0.0 2024-08-14 19:25:33,416 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 19:25:57,805 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 19:25:58,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2809820.0, ans=10.0 2024-08-14 19:26:04,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5650, loss[loss=0.09017, beats_loss=0.01222, ecapa_loss=0.0001785, whisper_loss=0.07617, over 21865.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.0001546, whisper_loss=0.09001, over 3903698.10 frames. ], batch size: 96, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:26:05,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2809920.0, ans=0.125 2024-08-14 19:26:07,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2809920.0, ans=0.125 2024-08-14 19:26:08,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2809920.0, ans=0.025 2024-08-14 19:26:14,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2809920.0, ans=0.1 2024-08-14 19:26:15,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2809920.0, ans=0.1 2024-08-14 19:26:24,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2810020.0, ans=0.125 2024-08-14 19:26:28,532 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-14 19:26:28,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2810020.0, ans=0.07 2024-08-14 19:26:39,232 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 19:26:42,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2810120.0, ans=0.0 2024-08-14 19:26:43,836 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 19:26:45,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2810120.0, ans=0.0 2024-08-14 19:26:58,350 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 19:27:08,125 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:27:08,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2810320.0, ans=0.125 2024-08-14 19:27:19,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5700, loss[loss=0.1133, beats_loss=0.008949, ecapa_loss=0.0001579, whisper_loss=0.1028, over 22609.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001555, whisper_loss=0.09081, over 3913369.71 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:27:21,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2810420.0, ans=0.0 2024-08-14 19:27:36,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2810520.0, ans=0.09899494936611666 2024-08-14 19:27:36,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-08-14 19:27:43,375 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 19:27:51,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.307e+01 2.514e+01 2.816e+01 4.087e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-14 19:27:54,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2810620.0, ans=0.0 2024-08-14 19:27:58,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2810620.0, ans=0.1 2024-08-14 19:28:09,804 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 19:28:14,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2810720.0, ans=0.0 2024-08-14 19:28:21,654 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 19:28:32,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5750, loss[loss=0.1065, beats_loss=0.01008, ecapa_loss=0.0001638, whisper_loss=0.09474, over 16471.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001549, whisper_loss=0.09135, over 3914515.34 frames. ], batch size: 66, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:28:35,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2810920.0, ans=0.0 2024-08-14 19:28:47,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2811020.0, ans=0.2 2024-08-14 19:28:59,238 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 19:29:06,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2811120.0, ans=0.04949747468305833 2024-08-14 19:29:07,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2811120.0, ans=0.125 2024-08-14 19:29:07,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2811120.0, ans=0.125 2024-08-14 19:29:11,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2811120.0, ans=0.05 2024-08-14 19:29:11,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2811120.0, ans=0.05 2024-08-14 19:29:14,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2811120.0, ans=0.125 2024-08-14 19:29:18,020 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 19:29:19,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2811220.0, ans=0.125 2024-08-14 19:29:21,457 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.785e-01 2024-08-14 19:29:23,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-14 19:29:23,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2024-08-14 19:29:37,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2811320.0, ans=0.1 2024-08-14 19:29:39,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2811320.0, ans=0.2 2024-08-14 19:29:43,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2811320.0, ans=0.0 2024-08-14 19:29:47,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=22.5 2024-08-14 19:29:49,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5800, loss[loss=0.1042, beats_loss=0.01098, ecapa_loss=0.0001211, whisper_loss=0.09199, over 18785.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001548, whisper_loss=0.09086, over 3910411.07 frames. ], batch size: 72, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:30:12,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2811520.0, ans=0.0 2024-08-14 19:30:22,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.246e+01 2.501e+01 2.765e+01 4.187e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 19:30:35,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2811720.0, ans=0.1 2024-08-14 19:30:45,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-08-14 19:31:03,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5850, loss[loss=0.1156, beats_loss=0.009439, ecapa_loss=0.0001789, whisper_loss=0.1044, over 19819.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001544, whisper_loss=0.09098, over 3899208.87 frames. ], batch size: 81, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:31:16,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2812020.0, ans=0.125 2024-08-14 19:31:28,457 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 19:32:03,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2812320.0, ans=0.0 2024-08-14 19:32:04,936 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 19:32:16,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5900, loss[loss=0.08973, beats_loss=0.01343, ecapa_loss=0.000143, whisper_loss=0.07487, over 16420.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001538, whisper_loss=0.09063, over 3904595.30 frames. ], batch size: 67, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:32:27,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2024-08-14 19:32:36,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2812520.0, ans=0.2 2024-08-14 19:32:36,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2812520.0, ans=0.125 2024-08-14 19:32:37,793 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 19:32:43,686 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 19:32:49,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.347e+01 2.667e+01 3.027e+01 4.357e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 19:32:56,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2812620.0, ans=0.07 2024-08-14 19:32:57,414 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 19:32:57,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-14 19:32:59,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2812620.0, ans=0.125 2024-08-14 19:33:01,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-08-14 19:33:21,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2812820.0, ans=0.2 2024-08-14 19:33:30,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 5950, loss[loss=0.09462, beats_loss=0.009927, ecapa_loss=0.0001663, whisper_loss=0.08303, over 21269.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.000154, whisper_loss=0.09003, over 3881206.38 frames. ], batch size: 88, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:33:54,141 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 19:33:55,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2813020.0, ans=0.125 2024-08-14 19:34:04,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2813120.0, ans=0.5 2024-08-14 19:34:07,244 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 19:34:32,033 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 19:34:45,162 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6000, loss[loss=0.1097, beats_loss=0.0111, ecapa_loss=0.0001148, whisper_loss=0.09744, over 22428.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01087, ecapa_loss=0.0001537, whisper_loss=0.08947, over 3886157.06 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:34:45,162 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 19:35:23,076 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005442, whisper_loss=0.2472, over 922467.00 frames. 2024-08-14 19:35:42,587 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on SV_voxceleb1: loss=0.004201, beats_loss=0, ecapa_loss=0.0004201, whisper_loss=0, over 939242.00 frames. 2024-08-14 19:36:27,530 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0946, 3.9444, 3.4479, 3.7274], device='cuda:2') 2024-08-14 19:37:36,295 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 19:37:36,299 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 19:37:38,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2024-08-14 19:38:06,118 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 19:38:10,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.295e+01 2.518e+01 2.791e+01 2.335e+02, threshold=5.037e+01, percent-clipped=2.0 2024-08-14 19:38:46,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2813820.0, ans=0.0 2024-08-14 19:38:53,081 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6050, loss[loss=0.1142, beats_loss=0.009103, ecapa_loss=0.0001752, whisper_loss=0.1033, over 23406.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001522, whisper_loss=0.09025, over 3866805.54 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:38:53,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2813920.0, ans=0.125 2024-08-14 19:38:59,110 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 19:39:01,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-14 19:39:06,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2814020.0, ans=0.125 2024-08-14 19:39:17,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2814020.0, ans=0.0 2024-08-14 19:39:21,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-14 19:39:42,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2814220.0, ans=0.125 2024-08-14 19:39:51,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2814320.0, ans=0.125 2024-08-14 19:40:05,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2814420.0, ans=0.2 2024-08-14 19:40:06,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6100, loss[loss=0.1054, beats_loss=0.01099, ecapa_loss=0.0001388, whisper_loss=0.09303, over 17704.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001522, whisper_loss=0.0906, over 3859816.57 frames. ], batch size: 71, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:40:08,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-14 19:40:16,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-14 19:40:20,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2814520.0, ans=0.125 2024-08-14 19:40:38,725 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.270e+01 2.572e+01 2.867e+01 4.147e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-14 19:40:44,657 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 19:40:56,648 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 19:41:02,365 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 19:41:11,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2814820.0, ans=0.0 2024-08-14 19:41:19,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6150, loss[loss=0.107, beats_loss=0.01159, ecapa_loss=0.0001276, whisper_loss=0.09415, over 14423.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001527, whisper_loss=0.09096, over 3901078.29 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:41:20,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2814920.0, ans=0.0 2024-08-14 19:41:26,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2814920.0, ans=0.125 2024-08-14 19:41:40,246 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 19:41:40,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2815020.0, ans=0.5 2024-08-14 19:41:40,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2815020.0, ans=0.2 2024-08-14 19:41:44,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2815020.0, ans=0.2 2024-08-14 19:41:46,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2815020.0, ans=0.2 2024-08-14 19:42:00,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2815120.0, ans=0.05 2024-08-14 19:42:00,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-14 19:42:04,703 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 19:42:13,329 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-14 19:42:25,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2815320.0, ans=0.125 2024-08-14 19:42:25,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2815320.0, ans=0.09899494936611666 2024-08-14 19:42:30,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815320.0, ans=0.1 2024-08-14 19:42:32,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6200, loss[loss=0.1223, beats_loss=0.009625, ecapa_loss=0.0001495, whisper_loss=0.1111, over 18008.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001525, whisper_loss=0.09132, over 3882548.83 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:42:33,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2024-08-14 19:42:46,500 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 19:42:57,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2815520.0, ans=0.125 2024-08-14 19:43:00,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2815520.0, ans=0.0 2024-08-14 19:43:05,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.332e+01 2.614e+01 2.876e+01 4.461e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-14 19:43:19,896 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 19:43:28,972 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-14 19:43:29,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2815720.0, ans=0.125 2024-08-14 19:43:48,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6250, loss[loss=0.1036, beats_loss=0.007783, ecapa_loss=0.0001397, whisper_loss=0.09444, over 23080.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09091, over 3870706.45 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:44:00,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2024-08-14 19:44:11,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2024-08-14 19:44:14,535 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 19:44:20,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2816120.0, ans=0.2 2024-08-14 19:44:33,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2816220.0, ans=0.125 2024-08-14 19:44:51,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2816320.0, ans=0.125 2024-08-14 19:44:53,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2816320.0, ans=0.0 2024-08-14 19:45:01,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6300, loss[loss=0.1178, beats_loss=0.008169, ecapa_loss=0.0001766, whisper_loss=0.1079, over 20587.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001521, whisper_loss=0.09094, over 3854110.60 frames. ], batch size: 84, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:45:19,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2816520.0, ans=0.07 2024-08-14 19:45:33,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.242e+01 2.428e+01 2.656e+01 5.822e+01, threshold=4.856e+01, percent-clipped=1.0 2024-08-14 19:45:36,051 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 19:45:54,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2816720.0, ans=0.125 2024-08-14 19:46:11,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2816820.0, ans=0.125 2024-08-14 19:46:13,712 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6350, loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001829, whisper_loss=0.09225, over 20127.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001533, whisper_loss=0.09148, over 3859968.54 frames. ], batch size: 84, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:46:15,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2816920.0, ans=0.09899494936611666 2024-08-14 19:46:49,505 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 24 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-14 19:46:52,688 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 19:47:20,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817320.0, ans=0.1 2024-08-14 19:47:26,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2817320.0, ans=0.125 2024-08-14 19:47:28,798 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6400, loss[loss=0.1248, beats_loss=0.008456, ecapa_loss=0.0001852, whisper_loss=0.1145, over 22130.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01059, ecapa_loss=0.0001536, whisper_loss=0.09214, over 3877437.48 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:47:52,232 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 34 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 19:47:58,502 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 19:48:01,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.350e+01 2.618e+01 2.916e+01 9.868e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-14 19:48:03,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2817620.0, ans=0.2 2024-08-14 19:48:06,971 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 19:48:07,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2817620.0, ans=0.0 2024-08-14 19:48:10,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2817620.0, ans=0.0 2024-08-14 19:48:32,109 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 36 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 19:48:33,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817820.0, ans=0.1 2024-08-14 19:48:36,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-14 19:48:36,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2817820.0, ans=0.125 2024-08-14 19:48:42,694 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6450, loss[loss=0.1008, beats_loss=0.01216, ecapa_loss=0.0001152, whisper_loss=0.08748, over 22450.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01054, ecapa_loss=0.0001539, whisper_loss=0.09241, over 3905111.68 frames. ], batch size: 87, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:48:44,343 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 19:48:53,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2817920.0, ans=0.0 2024-08-14 19:48:57,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2818020.0, ans=0.125 2024-08-14 19:49:15,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2818120.0, ans=0.1 2024-08-14 19:49:17,963 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 19:49:27,721 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 23 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-14 19:49:42,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2818220.0, ans=0.05 2024-08-14 19:49:47,928 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 19:50:00,105 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6500, loss[loss=0.1086, beats_loss=0.009595, ecapa_loss=0.0001622, whisper_loss=0.09742, over 19193.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01057, ecapa_loss=0.0001537, whisper_loss=0.09249, over 3928020.70 frames. ], batch size: 79, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:50:00,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2818420.0, ans=0.1 2024-08-14 19:50:04,825 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 19:50:08,086 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 19:50:19,081 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 19:50:20,606 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 19:50:35,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.395e+01 2.629e+01 2.951e+01 4.669e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 19:50:39,359 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 19:50:39,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2818620.0, ans=0.0 2024-08-14 19:50:42,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.63 vs. limit=22.5 2024-08-14 19:51:16,471 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6550, loss[loss=0.08538, beats_loss=0.01176, ecapa_loss=0.0001609, whisper_loss=0.072, over 17306.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01066, ecapa_loss=0.0001532, whisper_loss=0.09223, over 3948829.89 frames. ], batch size: 74, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:51:18,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2818920.0, ans=0.2 2024-08-14 19:51:19,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 19:51:37,505 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:51:46,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2819120.0, ans=0.125 2024-08-14 19:52:06,324 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 15 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 19:52:11,534 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 10 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 19:52:12,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2819220.0, ans=0.07 2024-08-14 19:52:35,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2819420.0, ans=0.1 2024-08-14 19:52:36,044 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6600, loss[loss=0.09205, beats_loss=0.01338, ecapa_loss=0.0001649, whisper_loss=0.07702, over 21619.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.000155, whisper_loss=0.09152, over 3947955.32 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:52:46,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2024-08-14 19:52:50,003 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 19:53:06,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2819520.0, ans=0.125 2024-08-14 19:53:07,386 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 19:53:13,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.460e+01 2.689e+01 3.191e+01 5.178e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-14 19:53:16,240 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-14 19:53:18,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2819620.0, ans=0.0 2024-08-14 19:53:23,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2819720.0, ans=0.1 2024-08-14 19:53:38,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2819820.0, ans=0.125 2024-08-14 19:53:48,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2819820.0, ans=0.07 2024-08-14 19:53:55,287 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6650, loss[loss=0.1067, beats_loss=0.01215, ecapa_loss=0.0001273, whisper_loss=0.09323, over 22982.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001562, whisper_loss=0.09166, over 3950725.96 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:53:56,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2819920.0, ans=0.0 2024-08-14 19:54:05,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2819920.0, ans=0.125 2024-08-14 19:54:05,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-14 19:54:17,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2820020.0, ans=0.0 2024-08-14 19:54:40,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-14 19:54:46,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2820220.0, ans=0.125 2024-08-14 19:54:58,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2820320.0, ans=0.125 2024-08-14 19:55:01,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2820320.0, ans=0.1 2024-08-14 19:55:11,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2820320.0, ans=0.125 2024-08-14 19:55:15,206 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6700, loss[loss=0.09887, beats_loss=0.01244, ecapa_loss=0.0001189, whisper_loss=0.08524, over 17755.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001546, whisper_loss=0.09164, over 3924836.24 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:55:18,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2820420.0, ans=0.0 2024-08-14 19:55:23,151 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 19:55:40,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2820520.0, ans=0.0 2024-08-14 19:55:50,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.346e+01 2.527e+01 2.810e+01 4.755e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 19:56:13,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2820720.0, ans=0.0 2024-08-14 19:56:17,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2820820.0, ans=0.125 2024-08-14 19:56:24,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-08-14 19:56:28,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-14 19:56:32,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6750, loss[loss=0.1035, beats_loss=0.01176, ecapa_loss=0.0001348, whisper_loss=0.09041, over 23287.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001544, whisper_loss=0.09097, over 3916885.17 frames. ], batch size: 94, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:56:39,218 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 19:56:43,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2820920.0, ans=0.0 2024-08-14 19:57:08,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2024-08-14 19:57:19,991 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-14 19:57:37,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-14 19:57:44,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2821320.0, ans=0.1 2024-08-14 19:57:50,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6800, loss[loss=0.1137, beats_loss=0.009296, ecapa_loss=0.0001754, whisper_loss=0.1026, over 23130.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.000156, whisper_loss=0.09105, over 3914088.28 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:58:04,975 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 19:58:06,491 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 19:58:08,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2821520.0, ans=0.125 2024-08-14 19:58:27,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.395e+01 2.601e+01 3.094e+01 9.420e+01, threshold=5.202e+01, percent-clipped=3.0 2024-08-14 19:58:31,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2821620.0, ans=0.125 2024-08-14 19:58:39,971 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 19:58:50,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2821720.0, ans=0.0 2024-08-14 19:59:07,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2024-08-14 19:59:08,223 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6850, loss[loss=0.1137, beats_loss=0.01099, ecapa_loss=0.0001401, whisper_loss=0.1013, over 17326.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001551, whisper_loss=0.09068, over 3914855.94 frames. ], batch size: 66, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:59:23,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2822020.0, ans=0.2 2024-08-14 19:59:41,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.25 vs. limit=22.5 2024-08-14 19:59:48,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2822120.0, ans=0.0 2024-08-14 19:59:56,104 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.078e+01 2024-08-14 19:59:57,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2822220.0, ans=0.2 2024-08-14 20:00:03,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2024-08-14 20:00:23,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6900, loss[loss=0.1007, beats_loss=0.01181, ecapa_loss=0.0001153, whisper_loss=0.08776, over 23863.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001544, whisper_loss=0.09109, over 3884960.35 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:00:57,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2822620.0, ans=0.0 2024-08-14 20:00:59,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.271e+01 2.537e+01 2.771e+01 4.123e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-14 20:01:21,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2822720.0, ans=0.125 2024-08-14 20:01:28,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2822820.0, ans=0.0 2024-08-14 20:01:28,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2822820.0, ans=0.125 2024-08-14 20:01:31,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2822820.0, ans=0.05 2024-08-14 20:01:33,318 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.408e+01 2024-08-14 20:01:40,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 6950, loss[loss=0.0774, beats_loss=0.01227, ecapa_loss=0.0001547, whisper_loss=0.06358, over 17379.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001539, whisper_loss=0.09118, over 3853227.07 frames. ], batch size: 70, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:01:54,298 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 20:02:31,318 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.319e+00 2024-08-14 20:02:55,620 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7000, loss[loss=0.1078, beats_loss=0.01153, ecapa_loss=0.0001534, whisper_loss=0.09469, over 22024.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001538, whisper_loss=0.09088, over 3832418.15 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:03:08,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2823420.0, ans=0.0 2024-08-14 20:03:12,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2823520.0, ans=0.1 2024-08-14 20:03:15,420 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 20:03:26,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2024-08-14 20:03:29,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.376e+01 2.618e+01 2.959e+01 4.269e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-14 20:03:41,607 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 27 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-14 20:03:48,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2823720.0, ans=0.125 2024-08-14 20:03:50,382 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 20:03:53,269 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 20:04:09,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7050, loss[loss=0.1121, beats_loss=0.009801, ecapa_loss=0.0001541, whisper_loss=0.1008, over 17410.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001534, whisper_loss=0.09061, over 3877406.84 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:04:10,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-08-14 20:04:27,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2824020.0, ans=0.125 2024-08-14 20:04:31,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2024-08-14 20:04:50,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2824120.0, ans=0.125 2024-08-14 20:04:53,604 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 20:05:09,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2824320.0, ans=0.125 2024-08-14 20:05:14,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2824320.0, ans=0.0 2024-08-14 20:05:19,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2024-08-14 20:05:24,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7100, loss[loss=0.08794, beats_loss=0.01046, ecapa_loss=0.0001983, whisper_loss=0.07551, over 15155.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001531, whisper_loss=0.09054, over 3888848.35 frames. ], batch size: 66, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:05:32,144 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 20:05:37,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2824420.0, ans=0.0 2024-08-14 20:06:00,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.271e+01 2.594e+01 2.929e+01 4.373e+01, threshold=5.188e+01, percent-clipped=0.0 2024-08-14 20:06:08,406 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.317e-02 2024-08-14 20:06:09,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2824720.0, ans=0.125 2024-08-14 20:06:20,743 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=15.0 2024-08-14 20:06:21,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2824720.0, ans=0.07 2024-08-14 20:06:38,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7150, loss[loss=0.1041, beats_loss=0.01336, ecapa_loss=0.0001068, whisper_loss=0.08969, over 22740.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001542, whisper_loss=0.09009, over 3870019.49 frames. ], batch size: 88, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:06:39,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2824920.0, ans=15.0 2024-08-14 20:06:40,251 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 20:06:44,918 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 20:06:58,183 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 20:07:01,014 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 31 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 20:07:04,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2825020.0, ans=0.0 2024-08-14 20:07:15,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2825120.0, ans=0.2 2024-08-14 20:07:17,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2825120.0, ans=0.05 2024-08-14 20:07:22,001 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 20:07:23,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825220.0, ans=0.1 2024-08-14 20:07:36,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2825220.0, ans=0.0 2024-08-14 20:07:38,965 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 20:07:41,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-14 20:07:41,543 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 20:07:44,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825320.0, ans=0.1 2024-08-14 20:07:49,233 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 20:07:53,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7200, loss[loss=0.08103, beats_loss=0.01243, ecapa_loss=0.0001326, whisper_loss=0.06727, over 21838.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001547, whisper_loss=0.09008, over 3905293.26 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:08:27,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.392e+01 2.665e+01 3.083e+01 4.439e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-14 20:08:28,269 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 20:08:32,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2825620.0, ans=0.0 2024-08-14 20:08:37,360 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 20:08:53,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-14 20:08:58,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2825820.0, ans=0.2 2024-08-14 20:08:59,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2825820.0, ans=0.125 2024-08-14 20:09:06,726 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7250, loss[loss=0.1017, beats_loss=0.009949, ecapa_loss=0.0001628, whisper_loss=0.09015, over 21493.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001534, whisper_loss=0.09076, over 3916788.66 frames. ], batch size: 88, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:09:26,931 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 20:09:28,497 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 20:09:36,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2826120.0, ans=0.0 2024-08-14 20:09:37,300 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 20:09:45,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2826120.0, ans=0.125 2024-08-14 20:09:51,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2826220.0, ans=0.0 2024-08-14 20:10:07,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2024-08-14 20:10:16,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2826320.0, ans=0.0 2024-08-14 20:10:18,540 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 20:10:19,757 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7300, loss[loss=0.08264, beats_loss=0.01129, ecapa_loss=0.0001645, whisper_loss=0.06971, over 21158.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001535, whisper_loss=0.09026, over 3905383.99 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:10:52,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2826420.0, ans=0.0 2024-08-14 20:10:59,184 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.295e-01 2024-08-14 20:11:05,029 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 20:11:26,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.413e+01 2.619e+01 3.021e+01 6.286e+01, threshold=5.238e+01, percent-clipped=1.0 2024-08-14 20:11:27,985 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 20:11:29,548 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 29 from Vox, 16 fro AS 2024-08-14 20:11:48,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2826820.0, ans=0.0 2024-08-14 20:11:52,090 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.331e-01 2024-08-14 20:11:58,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2024-08-14 20:12:01,580 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 20:12:04,216 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7350, loss[loss=0.1339, beats_loss=0.00834, ecapa_loss=0.0001494, whisper_loss=0.1241, over 22267.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001551, whisper_loss=0.09131, over 3884633.10 frames. ], batch size: 84, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:12:10,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2826920.0, ans=0.125 2024-08-14 20:12:12,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2826920.0, ans=0.125 2024-08-14 20:12:16,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2826920.0, ans=0.125 2024-08-14 20:12:17,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2827020.0, ans=0.0 2024-08-14 20:12:26,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-14 20:12:29,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2827020.0, ans=0.2 2024-08-14 20:12:49,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2827220.0, ans=0.125 2024-08-14 20:13:00,523 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 20:13:01,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827220.0, ans=0.1 2024-08-14 20:13:13,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2827320.0, ans=0.04949747468305833 2024-08-14 20:13:21,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7400, loss[loss=0.1136, beats_loss=0.01054, ecapa_loss=0.0001612, whisper_loss=0.1014, over 22760.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001543, whisper_loss=0.09032, over 3895796.68 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:13:45,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2827520.0, ans=0.2 2024-08-14 20:13:50,285 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 20:13:59,731 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.369e+01 2.692e+01 3.042e+01 1.751e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-14 20:14:37,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2827820.0, ans=0.1 2024-08-14 20:14:40,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7450, loss[loss=0.1044, beats_loss=0.009742, ecapa_loss=0.0001193, whisper_loss=0.09347, over 17909.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001554, whisper_loss=0.09102, over 3889585.69 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:14:56,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-14 20:15:02,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2828020.0, ans=0.125 2024-08-14 20:15:05,821 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 20:15:11,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2828020.0, ans=0.0 2024-08-14 20:15:15,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.80 vs. limit=22.5 2024-08-14 20:15:19,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2828120.0, ans=0.125 2024-08-14 20:15:31,607 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 20:15:50,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2828220.0, ans=0.125 2024-08-14 20:16:10,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2828320.0, ans=0.125 2024-08-14 20:16:15,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7500, loss[loss=0.1165, beats_loss=0.008349, ecapa_loss=0.0001605, whisper_loss=0.1065, over 15601.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001551, whisper_loss=0.09125, over 3891717.95 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:16:27,975 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 20:16:32,094 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 20:16:40,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-08-14 20:16:57,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2024-08-14 20:17:01,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.331e+01 2.565e+01 2.874e+01 3.652e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-14 20:17:04,269 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-14 20:17:04,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2828620.0, ans=0.05 2024-08-14 20:17:04,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2024-08-14 20:17:19,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2828720.0, ans=0.0 2024-08-14 20:17:48,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2828820.0, ans=10.0 2024-08-14 20:17:51,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7550, loss[loss=0.1082, beats_loss=0.01123, ecapa_loss=0.000137, whisper_loss=0.09557, over 22692.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001548, whisper_loss=0.09091, over 3917043.00 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:17:56,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2828920.0, ans=0.1 2024-08-14 20:18:23,735 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 17 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 20:18:25,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-14 20:18:33,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2829120.0, ans=0.015 2024-08-14 20:18:36,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-14 20:18:39,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2829120.0, ans=0.125 2024-08-14 20:19:25,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-14 20:19:25,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7600, loss[loss=0.08457, beats_loss=0.009582, ecapa_loss=0.0001579, whisper_loss=0.07341, over 14671.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001537, whisper_loss=0.09021, over 3856043.18 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:19:26,159 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 12 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 20:19:30,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-14 20:19:31,356 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 20:19:31,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-14 20:19:37,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2829420.0, ans=0.0 2024-08-14 20:19:49,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-08-14 20:19:56,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2829520.0, ans=0.04949747468305833 2024-08-14 20:20:00,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-08-14 20:20:04,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2829620.0, ans=0.125 2024-08-14 20:20:07,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2829620.0, ans=0.125 2024-08-14 20:20:08,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.345e+01 2.622e+01 3.093e+01 1.598e+02, threshold=5.244e+01, percent-clipped=3.0 2024-08-14 20:20:15,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2829620.0, ans=0.2 2024-08-14 20:20:16,805 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-14 20:20:18,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2829720.0, ans=0.1 2024-08-14 20:20:22,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2829720.0, ans=0.0 2024-08-14 20:20:26,422 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 20:20:27,909 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 20:20:35,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-08-14 20:20:46,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7650, loss[loss=0.1111, beats_loss=0.01152, ecapa_loss=0.0001362, whisper_loss=0.09822, over 20903.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001549, whisper_loss=0.09057, over 3834005.56 frames. ], batch size: 81, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:21:03,049 WARNING [optim.py:496] (2/4) Scaling gradients by 0.061782095581293106, model_norm_threshold=52.43657684326172 2024-08-14 20:21:03,229 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.640e+05, grad_sumsq=1.648e+07, orig_rms_sq=9.952e-03 2024-08-14 20:21:17,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-08-14 20:21:36,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2830220.0, ans=0.0 2024-08-14 20:21:39,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-14 20:21:57,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7700, loss[loss=0.1116, beats_loss=0.009331, ecapa_loss=0.0001723, whisper_loss=0.1005, over 22808.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001553, whisper_loss=0.0908, over 3855647.06 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:22:15,639 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 20:22:17,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2830520.0, ans=0.04949747468305833 2024-08-14 20:22:21,067 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 20:22:22,580 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 20:22:26,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2830620.0, ans=0.125 2024-08-14 20:22:30,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.409e+01 2.589e+01 2.990e+01 8.487e+02, threshold=5.178e+01, percent-clipped=3.0 2024-08-14 20:22:35,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2830620.0, ans=0.125 2024-08-14 20:23:07,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2830920.0, ans=0.125 2024-08-14 20:23:08,409 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7750, loss[loss=0.1254, beats_loss=0.009179, ecapa_loss=0.000166, whisper_loss=0.1145, over 17048.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001552, whisper_loss=0.09042, over 3857262.63 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:23:22,901 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 20:23:31,342 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 20:23:54,515 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 20:23:55,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-08-14 20:24:08,817 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 20:24:10,062 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 20:24:19,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7800, loss[loss=0.1097, beats_loss=0.008602, ecapa_loss=0.0001584, whisper_loss=0.0995, over 18122.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001527, whisper_loss=0.09079, over 3886024.00 frames. ], batch size: 71, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:24:43,266 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 20:24:44,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2831520.0, ans=0.0 2024-08-14 20:24:45,914 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 20:24:49,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2024-08-14 20:24:52,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2831620.0, ans=0.07 2024-08-14 20:24:54,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.359e+01 2.580e+01 2.928e+01 4.088e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-14 20:25:04,144 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 20:25:08,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2831720.0, ans=0.5 2024-08-14 20:25:17,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2831820.0, ans=0.0 2024-08-14 20:25:20,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2831820.0, ans=0.125 2024-08-14 20:25:30,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2831820.0, ans=0.125 2024-08-14 20:25:32,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7850, loss[loss=0.09977, beats_loss=0.009519, ecapa_loss=0.0001231, whisper_loss=0.08902, over 18189.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001529, whisper_loss=0.09039, over 3875218.29 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:25:43,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2831920.0, ans=0.125 2024-08-14 20:25:47,088 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-14 20:25:50,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2832020.0, ans=0.125 2024-08-14 20:25:52,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2832020.0, ans=0.0 2024-08-14 20:26:04,299 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 20:26:05,578 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 20:26:05,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2832120.0, ans=0.04949747468305833 2024-08-14 20:26:07,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2832120.0, ans=0.0 2024-08-14 20:26:17,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2024-08-14 20:26:19,421 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 20:26:26,728 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 20:26:43,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7900, loss[loss=0.112, beats_loss=0.008068, ecapa_loss=0.0001181, whisper_loss=0.1028, over 15457.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001529, whisper_loss=0.09048, over 3862926.40 frames. ], batch size: 53, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:26:54,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-14 20:27:03,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2832520.0, ans=0.0 2024-08-14 20:27:10,783 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-08-14 20:27:17,071 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 20:27:18,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.337e+01 2.582e+01 2.870e+01 4.311e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 20:27:22,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-14 20:27:56,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 7950, loss[loss=0.09625, beats_loss=0.009929, ecapa_loss=0.0002126, whisper_loss=0.0842, over 18721.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001531, whisper_loss=0.09122, over 3865221.86 frames. ], batch size: 81, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:27:57,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832920.0, ans=0.1 2024-08-14 20:28:07,361 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 20:28:17,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2833020.0, ans=0.0 2024-08-14 20:28:17,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2833020.0, ans=0.125 2024-08-14 20:28:18,239 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-14 20:28:25,675 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 20:28:30,994 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2024-08-14 20:28:31,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2833120.0, ans=0.125 2024-08-14 20:28:33,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2833120.0, ans=0.125 2024-08-14 20:28:51,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2833220.0, ans=0.1 2024-08-14 20:29:03,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2833320.0, ans=0.0 2024-08-14 20:29:09,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8000, loss[loss=0.1156, beats_loss=0.01124, ecapa_loss=0.0001516, whisper_loss=0.1028, over 19575.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001539, whisper_loss=0.09172, over 3872705.84 frames. ], batch size: 75, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:29:14,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=22.5 2024-08-14 20:29:27,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-14 20:29:31,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=12.0 2024-08-14 20:29:34,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2833520.0, ans=0.125 2024-08-14 20:29:34,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2024-08-14 20:29:38,386 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-08-14 20:29:43,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.316e+01 2.668e+01 3.025e+01 4.748e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 20:29:49,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2833620.0, ans=0.125 2024-08-14 20:30:20,487 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8050, loss[loss=0.08152, beats_loss=0.01262, ecapa_loss=0.0001831, whisper_loss=0.06707, over 20568.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001543, whisper_loss=0.09172, over 3839273.70 frames. ], batch size: 88, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:30:27,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2833920.0, ans=0.125 2024-08-14 20:30:34,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-14 20:30:51,020 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 20:30:54,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2024-08-14 20:30:59,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2834120.0, ans=0.0 2024-08-14 20:31:31,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8100, loss[loss=0.07691, beats_loss=0.01317, ecapa_loss=0.0001499, whisper_loss=0.06225, over 15358.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001547, whisper_loss=0.0912, over 3853825.58 frames. ], batch size: 63, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:31:33,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2834420.0, ans=0.125 2024-08-14 20:31:34,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2834420.0, ans=15.0 2024-08-14 20:31:44,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2834420.0, ans=0.125 2024-08-14 20:31:51,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-14 20:31:59,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2834620.0, ans=0.125 2024-08-14 20:32:06,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-14 20:32:06,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.290e+01 2.522e+01 2.889e+01 4.208e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 20:32:18,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2834720.0, ans=0.2 2024-08-14 20:32:24,065 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 20:32:33,022 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 20:32:41,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834820.0, ans=0.1 2024-08-14 20:32:45,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8150, loss[loss=0.08052, beats_loss=0.012, ecapa_loss=0.0001462, whisper_loss=0.06705, over 18350.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001542, whisper_loss=0.09086, over 3863605.14 frames. ], batch size: 73, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:32:54,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-14 20:32:55,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2834920.0, ans=0.0 2024-08-14 20:33:27,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2835120.0, ans=0.1 2024-08-14 20:33:29,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2835220.0, ans=0.125 2024-08-14 20:33:49,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2835320.0, ans=0.125 2024-08-14 20:33:50,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2835320.0, ans=0.125 2024-08-14 20:33:55,608 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 20:33:57,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2835420.0, ans=0.0 2024-08-14 20:33:57,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=22.5 2024-08-14 20:33:58,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8200, loss[loss=0.1051, beats_loss=0.009763, ecapa_loss=0.0001961, whisper_loss=0.09341, over 14998.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001544, whisper_loss=0.09122, over 3873596.70 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:34:01,335 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 20:34:04,209 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 20:34:19,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2835520.0, ans=0.0 2024-08-14 20:34:20,438 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 20:34:21,862 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 20:34:26,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2835620.0, ans=0.0 2024-08-14 20:34:33,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.288e+01 2.494e+01 2.883e+01 1.855e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-14 20:34:34,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-08-14 20:34:53,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-08-14 20:34:57,287 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 20:35:07,124 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 20:35:10,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8250, loss[loss=0.08388, beats_loss=0.01017, ecapa_loss=0.0001709, whisper_loss=0.072, over 15231.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001533, whisper_loss=0.09127, over 3863258.38 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:35:25,684 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 20:35:35,959 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 20:35:37,439 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 13 from Vox, 53 fro AS 2024-08-14 20:36:23,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8300, loss[loss=0.09459, beats_loss=0.009347, ecapa_loss=0.0001365, whisper_loss=0.08388, over 19209.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001516, whisper_loss=0.09127, over 3858961.65 frames. ], batch size: 72, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:36:35,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-14 20:36:37,981 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 20:36:51,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2836620.0, ans=0.125 2024-08-14 20:36:57,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.392e+01 2.726e+01 3.062e+01 2.103e+02, threshold=5.453e+01, percent-clipped=2.0 2024-08-14 20:37:16,447 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 20:37:18,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2836720.0, ans=0.0 2024-08-14 20:37:26,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2836820.0, ans=0.125 2024-08-14 20:37:33,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2836920.0, ans=0.0 2024-08-14 20:37:34,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8350, loss[loss=0.104, beats_loss=0.01304, ecapa_loss=0.0001088, whisper_loss=0.08992, over 24128.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001522, whisper_loss=0.09106, over 3857251.46 frames. ], batch size: 93, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:37:37,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2836920.0, ans=0.0 2024-08-14 20:37:42,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2836920.0, ans=0.0 2024-08-14 20:37:46,696 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 20:38:09,679 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 20:38:23,768 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 20:38:25,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2837220.0, ans=0.125 2024-08-14 20:38:26,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2837220.0, ans=0.0 2024-08-14 20:38:44,002 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 20:38:46,951 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8400, loss[loss=0.1075, beats_loss=0.01187, ecapa_loss=0.000137, whisper_loss=0.09429, over 23451.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001524, whisper_loss=0.09172, over 3867426.33 frames. ], batch size: 93, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:38:47,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2837420.0, ans=0.125 2024-08-14 20:38:50,242 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 20:38:51,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2837420.0, ans=0.05 2024-08-14 20:39:00,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2837520.0, ans=0.2 2024-08-14 20:39:00,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2837520.0, ans=0.125 2024-08-14 20:39:18,201 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 20:39:22,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.308e+01 2.540e+01 2.813e+01 3.907e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 20:39:30,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2837720.0, ans=0.0 2024-08-14 20:39:45,797 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:39:47,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2837820.0, ans=0.125 2024-08-14 20:39:51,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2024-08-14 20:39:57,625 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.78 vs. limit=22.5 2024-08-14 20:39:59,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8450, loss[loss=0.1139, beats_loss=0.01064, ecapa_loss=0.0001279, whisper_loss=0.1019, over 23518.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001536, whisper_loss=0.09159, over 3850869.54 frames. ], batch size: 93, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:40:06,402 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-14 20:40:09,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2837920.0, ans=0.125 2024-08-14 20:40:12,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2024-08-14 20:40:15,836 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-14 20:40:20,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2838020.0, ans=0.125 2024-08-14 20:40:30,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2838120.0, ans=0.1 2024-08-14 20:40:30,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.65 vs. limit=10.0 2024-08-14 20:40:34,495 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 20:40:41,277 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 20:40:57,731 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 27 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 20:40:59,078 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 20:41:03,162 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 33 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 20:41:07,374 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 33 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 20:41:11,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8500, loss[loss=0.09862, beats_loss=0.008401, ecapa_loss=0.0001912, whisper_loss=0.0883, over 13454.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01046, ecapa_loss=0.0001543, whisper_loss=0.0918, over 3855333.57 frames. ], batch size: 54, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:41:45,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.375e+01 2.644e+01 3.031e+01 3.106e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 20:42:05,781 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 35 from Vox, 26 fro AS 2024-08-14 20:42:16,150 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.814e-01 2024-08-14 20:42:20,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2838820.0, ans=0.0 2024-08-14 20:42:22,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8550, loss[loss=0.1191, beats_loss=0.008074, ecapa_loss=0.0001757, whisper_loss=0.1093, over 18397.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01046, ecapa_loss=0.0001538, whisper_loss=0.09201, over 3845021.67 frames. ], batch size: 70, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:42:27,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2838920.0, ans=0.125 2024-08-14 20:42:31,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2838920.0, ans=0.2 2024-08-14 20:42:39,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2839020.0, ans=0.125 2024-08-14 20:42:43,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2839020.0, ans=0.125 2024-08-14 20:42:48,544 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 20:42:51,327 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 20:42:52,867 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 20:42:56,072 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 20:43:15,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2839220.0, ans=0.0 2024-08-14 20:43:30,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-14 20:43:35,481 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8600, loss[loss=0.1018, beats_loss=0.0102, ecapa_loss=0.0001646, whisper_loss=0.08994, over 21944.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001543, whisper_loss=0.09168, over 3894409.01 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:43:37,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2839420.0, ans=0.2 2024-08-14 20:43:48,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2839420.0, ans=0.07 2024-08-14 20:43:55,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.83 vs. limit=10.0 2024-08-14 20:43:58,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2839520.0, ans=0.125 2024-08-14 20:44:05,415 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 20:44:10,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.454e+01 2.758e+01 3.025e+01 4.750e+01, threshold=5.517e+01, percent-clipped=0.0 2024-08-14 20:44:35,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2839820.0, ans=0.125 2024-08-14 20:44:38,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2839820.0, ans=0.125 2024-08-14 20:44:49,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8650, loss[loss=0.06257, beats_loss=0.01262, ecapa_loss=0.0001827, whisper_loss=0.04813, over 18837.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001542, whisper_loss=0.0912, over 3876423.65 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:45:04,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2839920.0, ans=0.0 2024-08-14 20:45:07,134 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 20:45:08,630 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 13 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 20:45:28,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840120.0, ans=0.1 2024-08-14 20:45:32,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2840120.0, ans=0.025 2024-08-14 20:45:45,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2840220.0, ans=0.125 2024-08-14 20:45:45,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2840220.0, ans=0.125 2024-08-14 20:46:05,344 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8700, loss[loss=0.1107, beats_loss=0.009642, ecapa_loss=0.0001965, whisper_loss=0.09905, over 21157.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001543, whisper_loss=0.09065, over 3859804.56 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:46:07,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2840420.0, ans=0.125 2024-08-14 20:46:08,597 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 16 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-14 20:46:14,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2840420.0, ans=0.125 2024-08-14 20:46:17,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2840420.0, ans=0.0 2024-08-14 20:46:24,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2840520.0, ans=0.0 2024-08-14 20:46:27,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2840520.0, ans=0.125 2024-08-14 20:46:39,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.465e+01 2.655e+01 3.081e+01 6.274e+01, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:46:43,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2840620.0, ans=0.125 2024-08-14 20:46:44,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2840620.0, ans=10.0 2024-08-14 20:47:08,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2024-08-14 20:47:08,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-08-14 20:47:11,169 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-08-14 20:47:14,578 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 20:47:14,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2840820.0, ans=0.0 2024-08-14 20:47:17,276 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8750, loss[loss=0.1054, beats_loss=0.01143, ecapa_loss=0.0001304, whisper_loss=0.09264, over 21263.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001544, whisper_loss=0.09052, over 3853794.04 frames. ], batch size: 83, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:47:20,523 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 20:47:42,929 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:47:46,199 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 20:48:03,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2841220.0, ans=0.0 2024-08-14 20:48:07,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2841220.0, ans=0.0 2024-08-14 20:48:19,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2841320.0, ans=0.125 2024-08-14 20:48:29,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8800, loss[loss=0.0876, beats_loss=0.01118, ecapa_loss=0.0001278, whisper_loss=0.07513, over 14176.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.000154, whisper_loss=0.09047, over 3873268.33 frames. ], batch size: 55, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:48:37,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.56 vs. limit=6.0 2024-08-14 20:48:40,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2841420.0, ans=0.2 2024-08-14 20:48:44,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2841520.0, ans=0.1 2024-08-14 20:49:05,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.267e+01 2.535e+01 2.766e+01 4.137e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-14 20:49:06,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-14 20:49:17,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2841720.0, ans=0.0 2024-08-14 20:49:33,203 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-14 20:49:43,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8850, loss[loss=0.1118, beats_loss=0.01004, ecapa_loss=0.0001112, whisper_loss=0.1007, over 24661.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01075, ecapa_loss=0.0001528, whisper_loss=0.09014, over 3871774.52 frames. ], batch size: 93, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:49:55,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2841920.0, ans=0.0 2024-08-14 20:50:09,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2842020.0, ans=0.0 2024-08-14 20:50:27,386 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 37 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 20:50:29,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-14 20:50:46,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-14 20:50:53,219 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 15 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 20:50:54,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8900, loss[loss=0.07453, beats_loss=0.01149, ecapa_loss=0.0001472, whisper_loss=0.06157, over 18926.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001528, whisper_loss=0.09029, over 3876424.97 frames. ], batch size: 76, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:50:57,320 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 20:51:04,744 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 20:51:18,736 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-14 20:51:20,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2842520.0, ans=0.0 2024-08-14 20:51:26,615 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 20:51:28,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2842620.0, ans=0.0 2024-08-14 20:51:29,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.335e+01 2.555e+01 2.826e+01 4.520e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-14 20:51:32,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2842620.0, ans=0.0 2024-08-14 20:51:42,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2842720.0, ans=0.125 2024-08-14 20:51:48,432 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 20:52:06,240 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 8950, loss[loss=0.1002, beats_loss=0.01025, ecapa_loss=0.0001265, whisper_loss=0.08869, over 22781.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001526, whisper_loss=0.09029, over 3854760.06 frames. ], batch size: 85, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:52:41,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2843120.0, ans=0.125 2024-08-14 20:52:45,783 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 20:53:07,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2843320.0, ans=0.125 2024-08-14 20:53:18,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9000, loss[loss=0.1101, beats_loss=0.01137, ecapa_loss=0.0001561, whisper_loss=0.09715, over 20586.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.000153, whisper_loss=0.09004, over 3840023.44 frames. ], batch size: 81, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:53:18,863 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 20:54:01,005 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005268, whisper_loss=0.2474, over 922467.00 frames. 2024-08-14 20:54:15,152 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2488, 2.6773, 3.1875, 1.6587, 1.9576, 2.1871, 3.1182, 2.9706], device='cuda:2') 2024-08-14 20:54:16,974 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-14 20:56:16,720 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on AT_audioset: loss=0.0236, beats_loss=0.0236, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 20:56:16,724 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 20:56:17,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2843420.0, ans=0.125 2024-08-14 20:56:28,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2843420.0, ans=0.125 2024-08-14 20:56:40,292 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 33 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 20:56:43,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2843520.0, ans=0.125 2024-08-14 20:56:48,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843620.0, ans=0.1 2024-08-14 20:56:51,975 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.250e+01 2.510e+01 2.872e+01 4.631e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-14 20:56:56,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2843620.0, ans=0.125 2024-08-14 20:57:06,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2843720.0, ans=0.0 2024-08-14 20:57:12,518 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 11 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 20:57:23,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2843820.0, ans=0.2 2024-08-14 20:57:29,880 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9050, loss[loss=0.08552, beats_loss=0.00976, ecapa_loss=0.0001972, whisper_loss=0.07379, over 17560.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001538, whisper_loss=0.09087, over 3859227.67 frames. ], batch size: 76, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:57:49,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2844020.0, ans=0.1 2024-08-14 20:58:00,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2844120.0, ans=0.0 2024-08-14 20:58:06,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2844120.0, ans=0.0 2024-08-14 20:58:11,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2844120.0, ans=0.0 2024-08-14 20:58:20,253 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 20:58:23,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2844220.0, ans=0.0 2024-08-14 20:58:25,843 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 20:58:43,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9100, loss[loss=0.1168, beats_loss=0.009262, ecapa_loss=0.0001763, whisper_loss=0.1058, over 22155.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001542, whisper_loss=0.09134, over 3882552.00 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:58:50,401 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 20:58:55,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2844420.0, ans=0.0 2024-08-14 20:58:58,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2844520.0, ans=0.0 2024-08-14 20:59:03,843 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-14 20:59:05,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2844520.0, ans=0.09899494936611666 2024-08-14 20:59:12,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2844620.0, ans=0.0 2024-08-14 20:59:17,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.420e+01 2.655e+01 2.997e+01 1.110e+02, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:59:18,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2844620.0, ans=0.1 2024-08-14 20:59:20,701 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 20:59:26,507 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 20:59:35,021 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 20:59:42,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2844820.0, ans=0.2 2024-08-14 20:59:52,661 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-14 20:59:54,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2844920.0, ans=0.0 2024-08-14 20:59:55,169 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9150, loss[loss=0.1123, beats_loss=0.0114, ecapa_loss=0.0001659, whisper_loss=0.09927, over 22974.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01066, ecapa_loss=0.0001543, whisper_loss=0.09138, over 3881838.56 frames. ], batch size: 92, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:00:07,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2844920.0, ans=0.0 2024-08-14 21:00:10,122 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 21:00:24,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2845120.0, ans=0.125 2024-08-14 21:00:25,699 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 21:00:27,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-08-14 21:00:32,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2845120.0, ans=0.0 2024-08-14 21:00:57,815 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 21:00:59,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2845320.0, ans=0.125 2024-08-14 21:01:07,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9200, loss[loss=0.108, beats_loss=0.01019, ecapa_loss=0.0001708, whisper_loss=0.0961, over 18859.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001539, whisper_loss=0.09142, over 3905761.24 frames. ], batch size: 75, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:01:23,046 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-14 21:01:29,914 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 21:01:31,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2845520.0, ans=0.0 2024-08-14 21:01:41,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.415e+01 2.661e+01 2.941e+01 2.596e+02, threshold=5.321e+01, percent-clipped=3.0 2024-08-14 21:01:43,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2845620.0, ans=0.125 2024-08-14 21:01:47,305 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 21:02:13,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2845820.0, ans=0.0 2024-08-14 21:02:18,808 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9250, loss[loss=0.1028, beats_loss=0.009188, ecapa_loss=0.0001778, whisper_loss=0.09188, over 17441.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001542, whisper_loss=0.09131, over 3942813.91 frames. ], batch size: 70, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:02:31,021 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 21:02:36,728 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 21:02:41,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2846020.0, ans=0.125 2024-08-14 21:03:16,219 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 21:03:17,544 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 21:03:33,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9300, loss[loss=0.1089, beats_loss=0.01093, ecapa_loss=0.000149, whisper_loss=0.09645, over 20049.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001541, whisper_loss=0.09104, over 3937542.93 frames. ], batch size: 79, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:03:36,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2846420.0, ans=0.0 2024-08-14 21:03:38,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2846420.0, ans=0.125 2024-08-14 21:03:53,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2846520.0, ans=0.125 2024-08-14 21:03:55,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2846520.0, ans=0.125 2024-08-14 21:04:02,141 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 21:04:08,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.351e+01 2.533e+01 2.913e+01 3.870e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-14 21:04:15,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2846620.0, ans=0.1 2024-08-14 21:04:24,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.54 vs. limit=6.0 2024-08-14 21:04:43,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2846820.0, ans=0.125 2024-08-14 21:04:45,951 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 21:04:48,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9350, loss[loss=0.1151, beats_loss=0.008503, ecapa_loss=0.0001575, whisper_loss=0.105, over 23980.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001547, whisper_loss=0.09114, over 3928970.80 frames. ], batch size: 92, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:04:57,777 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 21:05:14,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-14 21:05:16,537 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 21:05:18,166 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 21:05:22,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-14 21:05:25,759 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 21:05:30,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2847120.0, ans=0.0 2024-08-14 21:05:53,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2847320.0, ans=0.0 2024-08-14 21:06:00,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.99 vs. limit=5.0 2024-08-14 21:06:01,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9400, loss[loss=0.09972, beats_loss=0.01169, ecapa_loss=0.0001194, whisper_loss=0.08683, over 22264.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.09066, over 3924921.35 frames. ], batch size: 85, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:06:02,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-08-14 21:06:12,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.87 vs. limit=15.0 2024-08-14 21:06:38,206 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.317e+01 2.592e+01 2.927e+01 3.881e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-14 21:06:52,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2847720.0, ans=0.125 2024-08-14 21:06:57,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2847720.0, ans=0.0 2024-08-14 21:06:58,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2847720.0, ans=0.025 2024-08-14 21:06:59,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-14 21:07:15,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9450, loss[loss=0.106, beats_loss=0.01052, ecapa_loss=0.0001419, whisper_loss=0.09405, over 21449.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.000154, whisper_loss=0.09091, over 3937983.98 frames. ], batch size: 82, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:07:26,452 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-14 21:07:28,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2848020.0, ans=0.125 2024-08-14 21:07:34,208 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 21:07:39,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2848020.0, ans=0.0 2024-08-14 21:07:47,635 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 21:08:02,789 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 21:08:15,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848320.0, ans=0.1 2024-08-14 21:08:24,212 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 21:08:28,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9500, loss[loss=0.1104, beats_loss=0.007958, ecapa_loss=0.000157, whisper_loss=0.1009, over 17461.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001537, whisper_loss=0.09084, over 3948283.96 frames. ], batch size: 68, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:08:39,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2848420.0, ans=0.1 2024-08-14 21:09:03,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.327e+01 2.619e+01 2.918e+01 1.778e+02, threshold=5.238e+01, percent-clipped=2.0 2024-08-14 21:09:14,409 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 21:09:31,920 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 21:09:41,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2848920.0, ans=0.125 2024-08-14 21:09:42,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9550, loss[loss=0.1082, beats_loss=0.009892, ecapa_loss=0.0001722, whisper_loss=0.09657, over 16746.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.000154, whisper_loss=0.09017, over 3906739.79 frames. ], batch size: 67, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:09:42,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2848920.0, ans=0.2 2024-08-14 21:09:48,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2848920.0, ans=0.125 2024-08-14 21:09:51,186 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 21:10:29,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2849220.0, ans=0.125 2024-08-14 21:10:29,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2849220.0, ans=0.0 2024-08-14 21:10:34,579 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-14 21:10:36,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2849220.0, ans=0.95 2024-08-14 21:10:37,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2849220.0, ans=0.2 2024-08-14 21:10:42,996 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 10 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 21:10:52,716 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 21:10:53,705 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9600, loss[loss=0.1314, beats_loss=0.009675, ecapa_loss=0.0001385, whisper_loss=0.1203, over 24042.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001538, whisper_loss=0.09019, over 3895065.29 frames. ], batch size: 93, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:10:57,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2849420.0, ans=0.125 2024-08-14 21:11:03,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2849420.0, ans=0.0 2024-08-14 21:11:11,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2849520.0, ans=0.0 2024-08-14 21:11:11,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2849520.0, ans=0.1 2024-08-14 21:11:15,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2849520.0, ans=0.2 2024-08-14 21:11:29,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.324e+01 2.593e+01 2.905e+01 4.004e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 21:11:30,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2849620.0, ans=0.125 2024-08-14 21:11:31,350 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 21:11:31,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849620.0, ans=0.125 2024-08-14 21:11:36,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2849620.0, ans=0.0 2024-08-14 21:11:36,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=15.0 2024-08-14 21:11:43,252 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 21:11:50,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2849720.0, ans=0.05 2024-08-14 21:11:56,523 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 21:11:59,287 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 21:12:05,179 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-14 21:12:07,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9650, loss[loss=0.1259, beats_loss=0.009812, ecapa_loss=0.0001176, whisper_loss=0.1149, over 24109.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001542, whisper_loss=0.0901, over 3873639.96 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:12:08,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2849920.0, ans=0.125 2024-08-14 21:12:17,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2849920.0, ans=0.1 2024-08-14 21:12:42,429 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 21:12:53,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2850220.0, ans=0.125 2024-08-14 21:12:56,351 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 21:13:08,186 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:13:18,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2850320.0, ans=0.0 2024-08-14 21:13:20,549 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9700, loss[loss=0.09625, beats_loss=0.01023, ecapa_loss=0.0001461, whisper_loss=0.08456, over 20849.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001546, whisper_loss=0.09046, over 3899982.67 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:13:25,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2850420.0, ans=0.1 2024-08-14 21:13:33,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-14 21:13:51,629 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.02 vs. limit=22.5 2024-08-14 21:13:56,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.346e+01 2.562e+01 2.964e+01 3.831e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 21:13:59,787 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 21:14:00,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2850620.0, ans=0.125 2024-08-14 21:14:01,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2850620.0, ans=0.125 2024-08-14 21:14:04,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2850720.0, ans=0.0 2024-08-14 21:14:05,121 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 21:14:07,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2850720.0, ans=0.125 2024-08-14 21:14:29,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2850820.0, ans=0.125 2024-08-14 21:14:34,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9750, loss[loss=0.1225, beats_loss=0.009942, ecapa_loss=0.0001674, whisper_loss=0.1109, over 23230.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001545, whisper_loss=0.09061, over 3893965.30 frames. ], batch size: 92, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:14:36,704 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 21:14:47,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-08-14 21:14:52,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2851020.0, ans=0.1 2024-08-14 21:15:11,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-14 21:15:15,811 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-14 21:15:20,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2851220.0, ans=0.0 2024-08-14 21:15:24,672 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 21:15:35,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851320.0, ans=0.1 2024-08-14 21:15:36,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2851320.0, ans=0.0 2024-08-14 21:15:49,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9800, loss[loss=0.09817, beats_loss=0.01155, ecapa_loss=0.0001351, whisper_loss=0.08527, over 21701.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.000156, whisper_loss=0.09054, over 3874079.00 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:15:54,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2851420.0, ans=0.125 2024-08-14 21:16:10,824 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 21:16:22,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2851620.0, ans=0.04949747468305833 2024-08-14 21:16:24,145 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 21:16:25,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.297e+01 2.616e+01 2.876e+01 8.897e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 21:16:27,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2851620.0, ans=0.0 2024-08-14 21:16:29,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2851620.0, ans=0.09899494936611666 2024-08-14 21:16:35,931 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 21:16:37,357 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 21:16:41,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2851720.0, ans=0.125 2024-08-14 21:16:58,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2851820.0, ans=0.125 2024-08-14 21:17:03,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9850, loss[loss=0.1097, beats_loss=0.01112, ecapa_loss=0.0001102, whisper_loss=0.09744, over 24740.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.000155, whisper_loss=0.09083, over 3895771.91 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:17:06,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2851920.0, ans=0.04949747468305833 2024-08-14 21:17:10,313 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 21:17:28,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-14 21:17:31,090 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 21:17:39,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=12.0 2024-08-14 21:17:58,884 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 21:18:06,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2852320.0, ans=0.125 2024-08-14 21:18:14,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=22.5 2024-08-14 21:18:18,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9900, loss[loss=0.09793, beats_loss=0.01372, ecapa_loss=0.0001359, whisper_loss=0.08285, over 18083.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001537, whisper_loss=0.09093, over 3870232.42 frames. ], batch size: 72, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:18:33,791 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 21:18:53,028 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 21:18:54,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.391e+01 2.621e+01 2.869e+01 9.364e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 21:18:55,989 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 21:19:03,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2852720.0, ans=0.0 2024-08-14 21:19:10,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2852720.0, ans=0.07 2024-08-14 21:19:10,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2852720.0, ans=0.125 2024-08-14 21:19:11,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2852720.0, ans=0.125 2024-08-14 21:19:35,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 9950, loss[loss=0.1192, beats_loss=0.009485, ecapa_loss=0.0001952, whisper_loss=0.1078, over 14239.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001548, whisper_loss=0.09101, over 3842761.11 frames. ], batch size: 58, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:19:35,433 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 21:19:45,311 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 16 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 21:19:53,033 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 21:19:57,993 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 21:20:49,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2853320.0, ans=0.2 2024-08-14 21:20:51,969 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10000, loss[loss=0.09844, beats_loss=0.01134, ecapa_loss=0.0001564, whisper_loss=0.08553, over 14817.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001552, whisper_loss=0.09064, over 3787655.71 frames. ], batch size: 60, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:21:00,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2853420.0, ans=0.125 2024-08-14 21:21:01,814 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 21:21:05,362 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.466e-02 2024-08-14 21:21:08,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2853520.0, ans=0.0 2024-08-14 21:21:17,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-14 21:21:28,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.381e+01 2.626e+01 2.960e+01 1.740e+02, threshold=5.252e+01, percent-clipped=1.0 2024-08-14 21:21:52,040 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 21:21:54,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2853820.0, ans=0.125 2024-08-14 21:21:54,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2853820.0, ans=15.0 2024-08-14 21:22:08,822 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10050, loss[loss=0.09947, beats_loss=0.01144, ecapa_loss=0.0001297, whisper_loss=0.08673, over 18620.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001546, whisper_loss=0.09106, over 3826242.62 frames. ], batch size: 73, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:22:12,195 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 21:22:29,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2854020.0, ans=0.2 2024-08-14 21:22:41,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2854120.0, ans=0.1 2024-08-14 21:22:48,121 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 21:22:52,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2854120.0, ans=0.1 2024-08-14 21:23:10,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2854220.0, ans=0.125 2024-08-14 21:23:15,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2854320.0, ans=0.0 2024-08-14 21:23:30,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10100, loss[loss=0.1242, beats_loss=0.007676, ecapa_loss=0.0001749, whisper_loss=0.1147, over 20324.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001547, whisper_loss=0.09051, over 3858146.73 frames. ], batch size: 76, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:23:38,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2854420.0, ans=0.1 2024-08-14 21:23:52,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2024-08-14 21:23:55,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2854520.0, ans=0.125 2024-08-14 21:24:06,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2854620.0, ans=0.0 2024-08-14 21:24:10,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.362e+01 2.668e+01 2.989e+01 1.433e+02, threshold=5.336e+01, percent-clipped=3.0 2024-08-14 21:24:10,840 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 21:24:23,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2024-08-14 21:24:43,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2854820.0, ans=0.2 2024-08-14 21:24:51,349 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 21:24:52,688 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10150, loss[loss=0.09676, beats_loss=0.01166, ecapa_loss=0.0001462, whisper_loss=0.08364, over 21456.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.09024, over 3885630.83 frames. ], batch size: 86, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:25:04,639 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 21:25:15,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2855020.0, ans=0.0 2024-08-14 21:25:22,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2855020.0, ans=0.0 2024-08-14 21:25:35,403 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 21:25:47,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2855220.0, ans=0.2 2024-08-14 21:26:06,680 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-14 21:26:10,700 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10200, loss[loss=0.1161, beats_loss=0.01081, ecapa_loss=0.0001274, whisper_loss=0.104, over 17912.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001551, whisper_loss=0.09071, over 3909516.99 frames. ], batch size: 67, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:26:14,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2855420.0, ans=0.125 2024-08-14 21:26:26,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2855520.0, ans=0.125 2024-08-14 21:26:41,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2855620.0, ans=0.125 2024-08-14 21:26:42,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2855620.0, ans=0.0 2024-08-14 21:26:43,538 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 21:26:46,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.377e+01 2.660e+01 3.071e+01 4.492e+01, threshold=5.321e+01, percent-clipped=0.0 2024-08-14 21:26:47,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.95 vs. limit=10.0 2024-08-14 21:27:16,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2024-08-14 21:27:23,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10250, loss[loss=0.09825, beats_loss=0.01161, ecapa_loss=0.0001179, whisper_loss=0.08546, over 17890.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001548, whisper_loss=0.09093, over 3939016.88 frames. ], batch size: 70, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:27:33,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.87 vs. limit=10.0 2024-08-14 21:27:43,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2856020.0, ans=0.05 2024-08-14 21:27:51,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-14 21:28:38,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10300, loss[loss=0.09944, beats_loss=0.01012, ecapa_loss=0.0001658, whisper_loss=0.08766, over 21406.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001545, whisper_loss=0.08979, over 3916388.42 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:29:02,334 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 21:29:06,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-14 21:29:14,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.284e+01 2.585e+01 2.983e+01 4.241e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-14 21:29:28,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-08-14 21:29:53,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10350, loss[loss=0.1028, beats_loss=0.009725, ecapa_loss=0.0001498, whisper_loss=0.09155, over 21567.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001544, whisper_loss=0.09064, over 3926406.64 frames. ], batch size: 85, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:30:13,588 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 21:30:23,256 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 21:30:30,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2857120.0, ans=0.0 2024-08-14 21:30:33,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2857120.0, ans=0.125 2024-08-14 21:30:51,591 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:30:51,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2857320.0, ans=0.07 2024-08-14 21:30:52,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2857320.0, ans=0.0 2024-08-14 21:31:18,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2857320.0, ans=0.0 2024-08-14 21:31:31,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10400, loss[loss=0.1153, beats_loss=0.01051, ecapa_loss=9.958e-05, whisper_loss=0.1038, over 16056.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001537, whisper_loss=0.09085, over 3934053.68 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:31:50,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2857520.0, ans=0.2 2024-08-14 21:32:00,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2857520.0, ans=0.125 2024-08-14 21:32:14,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.400e+01 2.611e+01 2.963e+01 4.216e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-14 21:32:17,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2857620.0, ans=0.0 2024-08-14 21:32:30,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2857720.0, ans=0.0 2024-08-14 21:32:31,738 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 21:32:33,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2857720.0, ans=0.125 2024-08-14 21:32:43,841 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 21:32:52,019 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 21:32:59,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2857920.0, ans=0.07 2024-08-14 21:32:59,978 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10450, loss[loss=0.08953, beats_loss=0.008969, ecapa_loss=0.0001424, whisper_loss=0.07913, over 14750.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001537, whisper_loss=0.09092, over 3938443.43 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:33:13,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-08-14 21:33:25,411 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 21:33:43,503 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 21:33:45,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2858120.0, ans=0.125 2024-08-14 21:33:49,797 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-14 21:33:53,302 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-14 21:33:54,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-14 21:34:00,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2858220.0, ans=0.125 2024-08-14 21:34:04,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2024-08-14 21:34:09,637 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 21:34:20,853 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 21:34:22,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2858320.0, ans=0.1 2024-08-14 21:34:29,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10500, loss[loss=0.1031, beats_loss=0.01113, ecapa_loss=0.0001743, whisper_loss=0.09023, over 20681.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001526, whisper_loss=0.09097, over 3955908.14 frames. ], batch size: 84, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:34:29,505 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 21:34:51,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2858520.0, ans=0.2 2024-08-14 21:34:51,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2858520.0, ans=15.0 2024-08-14 21:34:59,648 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.388e+05 2024-08-14 21:35:11,010 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.314e+01 2.587e+01 2.967e+01 4.494e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-14 21:35:12,955 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 21:35:15,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-14 21:35:22,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2858720.0, ans=0.0 2024-08-14 21:35:38,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2858820.0, ans=0.1 2024-08-14 21:35:49,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.89 vs. limit=10.0 2024-08-14 21:35:53,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2858820.0, ans=0.125 2024-08-14 21:35:55,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-14 21:35:56,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10550, loss[loss=0.1227, beats_loss=0.00858, ecapa_loss=0.0001669, whisper_loss=0.1124, over 22839.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001533, whisper_loss=0.0912, over 3910827.50 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:36:00,170 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 21:36:05,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2858920.0, ans=0.0 2024-08-14 21:36:27,017 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 21:36:28,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-14 21:36:40,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2859120.0, ans=0.0 2024-08-14 21:36:51,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2859220.0, ans=0.0 2024-08-14 21:37:25,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10600, loss[loss=0.1024, beats_loss=0.006484, ecapa_loss=0.0001559, whisper_loss=0.09437, over 15214.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.000153, whisper_loss=0.09064, over 3900098.36 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:37:27,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2859420.0, ans=0.125 2024-08-14 21:37:27,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2859420.0, ans=0.09899494936611666 2024-08-14 21:37:30,691 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 20 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 21:37:31,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2859420.0, ans=0.125 2024-08-14 21:37:56,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2859520.0, ans=0.125 2024-08-14 21:38:07,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.377e+01 2.615e+01 3.017e+01 5.904e+01, threshold=5.231e+01, percent-clipped=2.0 2024-08-14 21:38:07,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2859620.0, ans=0.1 2024-08-14 21:38:18,756 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 21:38:19,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2859720.0, ans=0.125 2024-08-14 21:38:41,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2859820.0, ans=0.1 2024-08-14 21:38:52,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10650, loss[loss=0.09581, beats_loss=0.0123, ecapa_loss=0.0001405, whisper_loss=0.08211, over 23159.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.09077, over 3871120.96 frames. ], batch size: 92, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:39:03,478 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 21:39:04,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2859920.0, ans=0.0 2024-08-14 21:39:10,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-14 21:39:13,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2860020.0, ans=0.0 2024-08-14 21:39:30,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2860120.0, ans=0.125 2024-08-14 21:39:48,180 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 21:39:58,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2860320.0, ans=0.1 2024-08-14 21:40:15,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10700, loss[loss=0.1014, beats_loss=0.0123, ecapa_loss=0.0001666, whisper_loss=0.08739, over 21459.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001527, whisper_loss=0.09107, over 3878115.54 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:40:23,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2860420.0, ans=0.0 2024-08-14 21:40:42,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2860520.0, ans=0.125 2024-08-14 21:40:42,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2860520.0, ans=0.125 2024-08-14 21:40:57,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.610e+01 2.912e+01 4.621e+02, threshold=5.220e+01, percent-clipped=2.0 2024-08-14 21:41:08,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2860720.0, ans=0.0 2024-08-14 21:41:26,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2860820.0, ans=0.0 2024-08-14 21:41:26,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2860820.0, ans=0.125 2024-08-14 21:41:38,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2860820.0, ans=0.2 2024-08-14 21:41:40,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10750, loss[loss=0.112, beats_loss=0.008382, ecapa_loss=0.0001878, whisper_loss=0.1017, over 15727.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001535, whisper_loss=0.0919, over 3847159.78 frames. ], batch size: 64, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:41:49,667 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 21:41:50,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2860920.0, ans=0.0 2024-08-14 21:42:07,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2861020.0, ans=0.0 2024-08-14 21:42:18,250 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-14 21:42:44,369 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 21:42:58,274 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 21:43:09,086 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10800, loss[loss=0.09599, beats_loss=0.00957, ecapa_loss=0.0001317, whisper_loss=0.08511, over 17982.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001524, whisper_loss=0.09149, over 3876570.61 frames. ], batch size: 69, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:43:11,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2861420.0, ans=0.04949747468305833 2024-08-14 21:43:24,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2861520.0, ans=0.1 2024-08-14 21:43:49,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.222e+01 2.555e+01 2.864e+01 4.186e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-14 21:44:34,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10850, loss[loss=0.1194, beats_loss=0.00783, ecapa_loss=0.0001505, whisper_loss=0.1101, over 20528.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01061, ecapa_loss=0.000153, whisper_loss=0.09231, over 3912236.88 frames. ], batch size: 79, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:44:43,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.74 vs. limit=10.0 2024-08-14 21:44:43,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-14 21:44:48,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2861920.0, ans=0.1 2024-08-14 21:44:53,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-08-14 21:45:04,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862020.0, ans=0.1 2024-08-14 21:45:11,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2862120.0, ans=0.0 2024-08-14 21:45:22,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-14 21:45:36,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2862220.0, ans=0.125 2024-08-14 21:45:48,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2024-08-14 21:45:59,114 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10900, loss[loss=0.1043, beats_loss=0.01043, ecapa_loss=0.000139, whisper_loss=0.09244, over 14357.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09224, over 3914496.09 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:46:00,966 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 21:46:01,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-08-14 21:46:17,027 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 38 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 21:46:19,947 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 21:46:25,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2862520.0, ans=0.125 2024-08-14 21:46:27,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=12.0 2024-08-14 21:46:39,960 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.360e+01 2.623e+01 2.983e+01 2.734e+02, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 21:46:48,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2862720.0, ans=0.2 2024-08-14 21:46:48,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2862720.0, ans=0.04949747468305833 2024-08-14 21:46:49,735 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 21:46:53,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2862720.0, ans=0.125 2024-08-14 21:47:24,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2862920.0, ans=0.05 2024-08-14 21:47:25,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 10950, loss[loss=0.1237, beats_loss=0.008408, ecapa_loss=0.0001788, whisper_loss=0.1135, over 13568.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01059, ecapa_loss=0.0001527, whisper_loss=0.09279, over 3918537.98 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:47:26,092 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-14 21:47:32,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2862920.0, ans=0.0 2024-08-14 21:47:38,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862920.0, ans=0.1 2024-08-14 21:47:41,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2863020.0, ans=0.1 2024-08-14 21:47:43,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2863020.0, ans=0.0 2024-08-14 21:48:03,837 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 21:48:23,614 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 21:48:25,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2863220.0, ans=0.07 2024-08-14 21:48:27,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2863220.0, ans=0.035 2024-08-14 21:48:27,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2863220.0, ans=0.125 2024-08-14 21:48:34,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2863320.0, ans=0.0 2024-08-14 21:48:39,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2863320.0, ans=0.0 2024-08-14 21:48:41,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2863320.0, ans=0.09899494936611666 2024-08-14 21:48:41,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2863320.0, ans=0.025 2024-08-14 21:48:45,920 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-14 21:48:50,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11000, loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001469, whisper_loss=0.09022, over 23142.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01057, ecapa_loss=0.0001541, whisper_loss=0.09255, over 3932741.02 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:49:00,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2863420.0, ans=0.1 2024-08-14 21:49:00,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2863420.0, ans=0.2 2024-08-14 21:49:03,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2863420.0, ans=0.0 2024-08-14 21:49:03,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-14 21:49:19,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2863520.0, ans=0.125 2024-08-14 21:49:24,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2863620.0, ans=0.125 2024-08-14 21:49:29,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2863620.0, ans=0.125 2024-08-14 21:49:30,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.380e+01 2.630e+01 2.844e+01 1.265e+02, threshold=5.261e+01, percent-clipped=2.0 2024-08-14 21:49:42,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.34 vs. limit=10.0 2024-08-14 21:49:56,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2863720.0, ans=0.1 2024-08-14 21:49:56,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2863720.0, ans=0.2 2024-08-14 21:50:07,584 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 21:50:15,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11050, loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001266, whisper_loss=0.09018, over 16952.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001538, whisper_loss=0.09161, over 3939053.16 frames. ], batch size: 62, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:50:16,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-14 21:50:28,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-14 21:50:32,935 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 21:50:34,143 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04806042090058327, model_norm_threshold=52.6092414855957 2024-08-14 21:50:34,308 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.750e+05, grad_sumsq=3.750e+05, orig_rms_sq=1.000e+00 2024-08-14 21:50:44,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2864020.0, ans=0.2 2024-08-14 21:50:45,874 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 37 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 21:51:11,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2864220.0, ans=0.1 2024-08-14 21:51:11,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2864220.0, ans=0.04949747468305833 2024-08-14 21:51:14,645 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 21:51:25,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-14 21:51:28,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2864320.0, ans=0.035 2024-08-14 21:51:39,828 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11100, loss[loss=0.1054, beats_loss=0.01026, ecapa_loss=0.0001681, whisper_loss=0.09344, over 20117.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001549, whisper_loss=0.09128, over 3925941.95 frames. ], batch size: 81, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:51:49,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2864420.0, ans=0.125 2024-08-14 21:51:51,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2864420.0, ans=0.125 2024-08-14 21:52:01,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2864520.0, ans=0.125 2024-08-14 21:52:02,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2864520.0, ans=0.125 2024-08-14 21:52:19,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.327e+01 2.589e+01 2.870e+01 1.095e+03, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 21:52:22,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2024-08-14 21:52:37,376 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 21:52:43,349 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 21:52:55,936 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 21:52:58,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2024-08-14 21:53:04,725 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11150, loss[loss=0.1089, beats_loss=0.01237, ecapa_loss=0.000123, whisper_loss=0.09534, over 22929.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001545, whisper_loss=0.0915, over 3886115.28 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:53:08,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2864920.0, ans=0.125 2024-08-14 21:53:20,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-08-14 21:53:40,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2865120.0, ans=0.0 2024-08-14 21:53:43,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865120.0, ans=0.1 2024-08-14 21:53:47,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2865120.0, ans=0.125 2024-08-14 21:54:01,923 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 21:54:17,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2865320.0, ans=0.0 2024-08-14 21:54:23,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2865320.0, ans=0.125 2024-08-14 21:54:28,296 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11200, loss[loss=0.139, beats_loss=0.009227, ecapa_loss=0.0001804, whisper_loss=0.128, over 20956.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001551, whisper_loss=0.0916, over 3869751.63 frames. ], batch size: 80, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:54:33,254 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:54:47,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2865520.0, ans=0.125 2024-08-14 21:54:50,029 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 21:55:03,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2865620.0, ans=0.125 2024-08-14 21:55:07,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.313e+01 2.527e+01 2.769e+01 5.053e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 21:55:40,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2865820.0, ans=0.125 2024-08-14 21:55:52,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11250, loss[loss=0.1328, beats_loss=0.006005, ecapa_loss=0.0001713, whisper_loss=0.1251, over 17652.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001549, whisper_loss=0.09145, over 3855978.11 frames. ], batch size: 67, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:56:11,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2866020.0, ans=0.0 2024-08-14 21:56:18,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2866020.0, ans=0.2 2024-08-14 21:56:26,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2866120.0, ans=0.0 2024-08-14 21:56:31,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2866120.0, ans=0.125 2024-08-14 21:56:31,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2866120.0, ans=0.125 2024-08-14 21:56:33,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=8.0 2024-08-14 21:56:37,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-14 21:56:50,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2866220.0, ans=0.125 2024-08-14 21:56:58,954 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-14 21:57:02,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2866320.0, ans=0.0 2024-08-14 21:57:11,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-14 21:57:12,448 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 21:57:18,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11300, loss[loss=0.08615, beats_loss=0.01014, ecapa_loss=0.0001662, whisper_loss=0.07436, over 14730.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001539, whisper_loss=0.09102, over 3844484.68 frames. ], batch size: 60, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:57:18,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2866420.0, ans=0.125 2024-08-14 21:57:18,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2866420.0, ans=0.125 2024-08-14 21:57:48,594 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:57:58,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.341e+01 2.610e+01 2.928e+01 1.579e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 21:58:04,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2866620.0, ans=0.125 2024-08-14 21:58:41,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11350, loss[loss=0.09208, beats_loss=0.008132, ecapa_loss=0.0001772, whisper_loss=0.08218, over 13488.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.000153, whisper_loss=0.09137, over 3884087.19 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:58:43,726 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 35 from Vox, 36 fro AS 2024-08-14 21:58:57,180 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 28 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 21:59:05,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2867020.0, ans=0.0 2024-08-14 21:59:07,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2867020.0, ans=0.1 2024-08-14 21:59:17,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-14 21:59:41,696 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 21:59:48,559 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 21:59:57,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2867320.0, ans=0.125 2024-08-14 22:00:04,904 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-14 22:00:08,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11400, loss[loss=0.1006, beats_loss=0.01074, ecapa_loss=0.0001496, whisper_loss=0.08839, over 21618.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001519, whisper_loss=0.09137, over 3903136.11 frames. ], batch size: 86, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:00:32,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2867520.0, ans=0.0 2024-08-14 22:00:35,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2867520.0, ans=0.1 2024-08-14 22:00:38,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2867520.0, ans=0.125 2024-08-14 22:00:45,218 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 22:00:46,807 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 22:00:49,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.381e+01 2.566e+01 2.831e+01 4.188e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-14 22:01:12,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2867720.0, ans=0.0 2024-08-14 22:01:12,650 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2024-08-14 22:01:18,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2867820.0, ans=0.125 2024-08-14 22:01:28,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-14 22:01:28,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2024-08-14 22:01:31,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11450, loss[loss=0.08516, beats_loss=0.01104, ecapa_loss=0.0001521, whisper_loss=0.0726, over 17507.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001525, whisper_loss=0.09098, over 3864952.23 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:02:08,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2868120.0, ans=0.125 2024-08-14 22:02:37,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2868320.0, ans=0.0 2024-08-14 22:02:50,207 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 22:02:52,063 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 22:02:53,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11500, loss[loss=0.09718, beats_loss=0.01041, ecapa_loss=0.0001442, whisper_loss=0.08532, over 17943.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001514, whisper_loss=0.09095, over 3851217.49 frames. ], batch size: 71, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:02:54,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2868420.0, ans=0.125 2024-08-14 22:02:57,013 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 22:03:02,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2868420.0, ans=0.125 2024-08-14 22:03:09,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2868520.0, ans=0.0 2024-08-14 22:03:19,984 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 22:03:22,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2868520.0, ans=0.125 2024-08-14 22:03:34,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.455e+01 2.723e+01 3.029e+01 4.016e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 22:03:37,116 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 22:03:47,436 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 22:04:02,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2868820.0, ans=0.2 2024-08-14 22:04:04,563 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-14 22:04:07,914 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 22:04:18,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11550, loss[loss=0.1081, beats_loss=0.01077, ecapa_loss=0.000146, whisper_loss=0.09584, over 17937.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001523, whisper_loss=0.09152, over 3861502.81 frames. ], batch size: 69, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:04:19,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2868920.0, ans=0.125 2024-08-14 22:04:25,500 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-14 22:04:59,987 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 22:05:23,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2869320.0, ans=0.125 2024-08-14 22:05:39,788 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11600, loss[loss=0.07428, beats_loss=0.01048, ecapa_loss=0.0001652, whisper_loss=0.06215, over 19399.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001521, whisper_loss=0.09069, over 3899755.75 frames. ], batch size: 80, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:06:08,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2869520.0, ans=0.1 2024-08-14 22:06:19,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.332e+01 2.648e+01 3.162e+01 2.380e+02, threshold=5.297e+01, percent-clipped=2.0 2024-08-14 22:06:33,575 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:06:41,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2869720.0, ans=0.025 2024-08-14 22:07:00,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11650, loss[loss=0.1005, beats_loss=0.009802, ecapa_loss=0.0001441, whisper_loss=0.08921, over 19923.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001532, whisper_loss=0.09126, over 3908660.63 frames. ], batch size: 77, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:07:43,183 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 22:07:55,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2870220.0, ans=0.125 2024-08-14 22:07:56,319 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 22:08:01,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2870220.0, ans=0.025 2024-08-14 22:08:23,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11700, loss[loss=0.1186, beats_loss=0.01101, ecapa_loss=0.0001224, whisper_loss=0.1063, over 16679.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001525, whisper_loss=0.09105, over 3913021.27 frames. ], batch size: 62, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:08:25,802 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 22:08:47,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2870520.0, ans=0.125 2024-08-14 22:08:53,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2870520.0, ans=0.2 2024-08-14 22:08:55,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2870520.0, ans=0.2 2024-08-14 22:09:02,754 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 22:09:06,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.421e+01 2.717e+01 3.040e+01 4.718e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-14 22:09:06,653 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 22:09:08,184 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 22:09:21,750 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 22:09:24,575 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 22:09:26,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2870720.0, ans=0.0 2024-08-14 22:09:45,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11750, loss[loss=0.08743, beats_loss=0.01256, ecapa_loss=0.0001274, whisper_loss=0.07359, over 18858.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001526, whisper_loss=0.09127, over 3920045.02 frames. ], batch size: 75, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:10:45,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2871220.0, ans=0.0 2024-08-14 22:11:08,789 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-14 22:11:22,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11800, loss[loss=0.1191, beats_loss=0.009668, ecapa_loss=0.0001731, whisper_loss=0.1077, over 16119.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001523, whisper_loss=0.09198, over 3959453.63 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:11:42,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2871520.0, ans=0.125 2024-08-14 22:11:50,945 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2024-08-14 22:11:52,156 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.426e-02 2024-08-14 22:12:00,293 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 22:12:03,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.341e+01 2.560e+01 2.807e+01 8.705e+01, threshold=5.119e+01, percent-clipped=2.0 2024-08-14 22:12:19,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2871720.0, ans=0.025 2024-08-14 22:12:25,724 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-14 22:12:41,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2871820.0, ans=0.125 2024-08-14 22:12:42,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-14 22:12:49,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2871820.0, ans=0.0 2024-08-14 22:12:55,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11850, loss[loss=0.1076, beats_loss=0.01042, ecapa_loss=0.0001609, whisper_loss=0.09553, over 23313.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001527, whisper_loss=0.09128, over 3955744.74 frames. ], batch size: 94, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:13:40,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2872120.0, ans=0.125 2024-08-14 22:13:53,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2024-08-14 22:14:26,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2872320.0, ans=0.0 2024-08-14 22:14:36,754 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 22:14:48,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11900, loss[loss=0.09975, beats_loss=0.01128, ecapa_loss=0.0001656, whisper_loss=0.08681, over 18376.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001528, whisper_loss=0.09039, over 3958006.15 frames. ], batch size: 77, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:15:34,820 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 22:15:38,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.11 vs. limit=10.0 2024-08-14 22:15:41,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2872620.0, ans=0.0 2024-08-14 22:15:44,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.256e+01 2.473e+01 2.865e+01 1.430e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-14 22:15:56,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2872720.0, ans=0.125 2024-08-14 22:16:20,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2872820.0, ans=0.125 2024-08-14 22:16:24,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2872820.0, ans=0.125 2024-08-14 22:16:26,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2872820.0, ans=0.125 2024-08-14 22:16:39,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2872920.0, ans=0.125 2024-08-14 22:16:40,743 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 11950, loss[loss=0.1274, beats_loss=0.008406, ecapa_loss=0.0001074, whisper_loss=0.1179, over 17313.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001537, whisper_loss=0.09062, over 3934976.33 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:17:15,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2873020.0, ans=0.0 2024-08-14 22:17:23,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2873120.0, ans=0.0 2024-08-14 22:17:30,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2873120.0, ans=0.0 2024-08-14 22:18:03,592 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 22:18:21,957 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12000, loss[loss=0.1164, beats_loss=0.008204, ecapa_loss=0.0001541, whisper_loss=0.1067, over 19456.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01076, ecapa_loss=0.0001542, whisper_loss=0.08956, over 3899899.69 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:18:21,958 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 22:18:34,886 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9787, 3.0542, 3.6899, 3.5393], device='cuda:2') 2024-08-14 22:19:04,376 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005404, whisper_loss=0.2466, over 922467.00 frames. 2024-08-14 22:19:20,887 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on SV_voxceleb1: loss=0.004324, beats_loss=0, ecapa_loss=0.0004324, whisper_loss=0, over 939242.00 frames. 2024-08-14 22:21:26,155 INFO [train_multi_KD3.py:1149] (2/4) Epoch 20, validation on AT_audioset: loss=0.02348, beats_loss=0.02348, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 22:21:26,160 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 22:21:41,720 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 22:21:55,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2873620.0, ans=0.125 2024-08-14 22:22:03,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.280e+01 2.500e+01 2.714e+01 9.772e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-14 22:22:14,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2873720.0, ans=0.5 2024-08-14 22:22:20,378 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 22:22:23,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2873720.0, ans=0.125 2024-08-14 22:22:36,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2024-08-14 22:22:41,330 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12050, loss[loss=0.1054, beats_loss=0.01025, ecapa_loss=0.0001472, whisper_loss=0.09372, over 22506.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0107, ecapa_loss=0.0001542, whisper_loss=0.08957, over 3906942.56 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:22:41,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2873920.0, ans=0.125 2024-08-14 22:22:46,310 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 22:22:49,005 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 22:22:49,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-08-14 22:22:51,320 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-08-14 22:22:56,815 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 22:23:18,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-14 22:23:19,728 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 22:23:24,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2874220.0, ans=0.0 2024-08-14 22:23:26,945 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 22:23:28,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2874220.0, ans=0.0 2024-08-14 22:23:45,705 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 22:23:48,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2874320.0, ans=0.125 2024-08-14 22:23:49,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-14 22:23:56,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12100, loss[loss=0.1118, beats_loss=0.01048, ecapa_loss=0.0001195, whisper_loss=0.1001, over 23218.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001544, whisper_loss=0.08995, over 3890343.06 frames. ], batch size: 88, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:24:02,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2874420.0, ans=0.1 2024-08-14 22:24:06,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874420.0, ans=0.1 2024-08-14 22:24:19,237 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-14 22:24:25,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2874520.0, ans=0.0 2024-08-14 22:24:35,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.418e+01 2.574e+01 2.910e+01 4.724e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 22:24:36,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-14 22:24:46,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2874720.0, ans=0.05 2024-08-14 22:24:50,654 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 22:24:52,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2874720.0, ans=0.0 2024-08-14 22:24:52,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2874720.0, ans=0.1 2024-08-14 22:25:13,579 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12150, loss[loss=0.1182, beats_loss=0.008994, ecapa_loss=0.0001649, whisper_loss=0.1075, over 23414.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001544, whisper_loss=0.09029, over 3884335.99 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:25:21,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2874920.0, ans=0.0 2024-08-14 22:25:28,489 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 22:25:35,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-08-14 22:25:48,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2875120.0, ans=0.0 2024-08-14 22:26:06,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2875220.0, ans=0.125 2024-08-14 22:26:28,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12200, loss[loss=0.1187, beats_loss=0.0108, ecapa_loss=0.0001297, whisper_loss=0.1066, over 21810.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001543, whisper_loss=0.09056, over 3889050.91 frames. ], batch size: 84, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:26:44,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2875520.0, ans=0.125 2024-08-14 22:26:45,065 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 22:26:45,953 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.827e+00 2024-08-14 22:26:48,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2875520.0, ans=0.05 2024-08-14 22:27:04,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-14 22:27:06,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.621e+01 2.982e+01 1.533e+02, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 22:27:30,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-08-14 22:27:45,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12250, loss[loss=0.1046, beats_loss=0.01064, ecapa_loss=0.0001473, whisper_loss=0.09249, over 21723.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001534, whisper_loss=0.09085, over 3889466.39 frames. ], batch size: 87, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:28:03,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2876020.0, ans=0.1 2024-08-14 22:28:34,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2024-08-14 22:28:52,201 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 22:29:02,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12300, loss[loss=0.0771, beats_loss=0.01189, ecapa_loss=0.0001683, whisper_loss=0.06353, over 13200.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001533, whisper_loss=0.09105, over 3900971.28 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:29:13,405 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 22:29:19,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2876520.0, ans=0.0 2024-08-14 22:29:39,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.269e+01 2.570e+01 2.894e+01 3.715e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 22:29:47,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2876720.0, ans=0.125 2024-08-14 22:29:55,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2024-08-14 22:30:10,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2876820.0, ans=0.09899494936611666 2024-08-14 22:30:13,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2876820.0, ans=0.125 2024-08-14 22:30:17,866 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12350, loss[loss=0.1094, beats_loss=0.009751, ecapa_loss=0.000154, whisper_loss=0.09809, over 18837.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001533, whisper_loss=0.09048, over 3899202.00 frames. ], batch size: 75, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:30:23,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2876920.0, ans=0.0 2024-08-14 22:30:26,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2876920.0, ans=0.125 2024-08-14 22:30:44,628 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.245e+01 2024-08-14 22:30:50,660 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 22:30:56,095 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 22:31:10,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-08-14 22:31:20,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2877320.0, ans=0.035 2024-08-14 22:31:29,292 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 22:31:33,643 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 22:31:33,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2877420.0, ans=0.125 2024-08-14 22:31:34,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12400, loss[loss=0.1001, beats_loss=0.009311, ecapa_loss=0.0001737, whisper_loss=0.089, over 16982.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001531, whisper_loss=0.09051, over 3877469.02 frames. ], batch size: 72, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:31:42,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2877420.0, ans=0.2 2024-08-14 22:31:49,643 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 22:31:51,997 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:31:55,732 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 22:32:08,818 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 22:32:09,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2877620.0, ans=0.125 2024-08-14 22:32:10,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2877620.0, ans=0.1 2024-08-14 22:32:13,148 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.352e+01 2.633e+01 2.974e+01 1.809e+02, threshold=5.265e+01, percent-clipped=2.0 2024-08-14 22:32:20,819 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 22:32:49,919 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12450, loss[loss=0.08682, beats_loss=0.01123, ecapa_loss=0.0001283, whisper_loss=0.07431, over 18407.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001545, whisper_loss=0.09075, over 3877141.91 frames. ], batch size: 72, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:32:55,915 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 22:33:06,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2878020.0, ans=0.125 2024-08-14 22:33:17,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2878020.0, ans=0.0 2024-08-14 22:33:18,020 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 22:33:25,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-14 22:33:41,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2878220.0, ans=0.0 2024-08-14 22:33:49,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2878320.0, ans=0.125 2024-08-14 22:33:52,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-14 22:33:54,626 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 22:34:04,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12500, loss[loss=0.1035, beats_loss=0.01324, ecapa_loss=0.0001145, whisper_loss=0.0891, over 22742.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001539, whisper_loss=0.09017, over 3900895.10 frames. ], batch size: 89, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:34:05,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2878420.0, ans=0.0 2024-08-14 22:34:18,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2878420.0, ans=0.0 2024-08-14 22:34:33,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=15.0 2024-08-14 22:34:43,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.340e+01 2.618e+01 2.965e+01 2.177e+02, threshold=5.235e+01, percent-clipped=3.0 2024-08-14 22:34:54,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2878720.0, ans=0.0 2024-08-14 22:35:00,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878720.0, ans=0.1 2024-08-14 22:35:06,190 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-14 22:35:07,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2878820.0, ans=0.125 2024-08-14 22:35:17,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2878820.0, ans=0.125 2024-08-14 22:35:21,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12550, loss[loss=0.1168, beats_loss=0.008228, ecapa_loss=0.0001726, whisper_loss=0.1068, over 20094.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001538, whisper_loss=0.09016, over 3884588.62 frames. ], batch size: 80, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:35:25,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2878920.0, ans=0.125 2024-08-14 22:35:28,914 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 22:35:32,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2878920.0, ans=0.125 2024-08-14 22:35:36,100 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 22:35:55,903 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:35:57,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=12.0 2024-08-14 22:36:19,058 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 22:36:27,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2879320.0, ans=0.025 2024-08-14 22:36:35,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12600, loss[loss=0.1126, beats_loss=0.008723, ecapa_loss=0.0001621, whisper_loss=0.1023, over 15364.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001546, whisper_loss=0.09068, over 3919075.68 frames. ], batch size: 58, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:36:55,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2879520.0, ans=0.125 2024-08-14 22:36:56,538 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 22:36:59,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2879520.0, ans=0.0 2024-08-14 22:37:06,886 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 22:37:10,967 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 22:37:13,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.404e+01 2.680e+01 3.035e+01 5.751e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-14 22:37:17,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2024-08-14 22:37:23,332 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 22:37:47,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2879820.0, ans=0.0 2024-08-14 22:37:50,516 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 22:37:51,544 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12650, loss[loss=0.1069, beats_loss=0.01257, ecapa_loss=0.0001187, whisper_loss=0.09311, over 23381.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001539, whisper_loss=0.09056, over 3903679.39 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:38:13,598 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 22:38:20,992 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 22:38:35,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-08-14 22:38:49,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.26 vs. limit=10.0 2024-08-14 22:38:56,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2880320.0, ans=0.07 2024-08-14 22:39:09,570 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12700, loss[loss=0.1288, beats_loss=0.008217, ecapa_loss=0.0001194, whisper_loss=0.1194, over 22794.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.000153, whisper_loss=0.09076, over 3884963.11 frames. ], batch size: 81, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:39:36,260 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 22:39:38,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2880520.0, ans=0.2 2024-08-14 22:39:47,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2880620.0, ans=0.125 2024-08-14 22:39:48,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.293e+01 2.553e+01 2.866e+01 4.469e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-14 22:40:03,223 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 22:40:28,370 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12750, loss[loss=0.08388, beats_loss=0.01298, ecapa_loss=0.0001158, whisper_loss=0.06975, over 15395.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001539, whisper_loss=0.09115, over 3879605.85 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:40:37,607 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-14 22:40:41,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2880920.0, ans=0.125 2024-08-14 22:40:44,783 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-14 22:40:54,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2881020.0, ans=0.125 2024-08-14 22:40:57,966 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 22:41:42,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2881320.0, ans=0.2 2024-08-14 22:41:47,795 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12800, loss[loss=0.1246, beats_loss=0.009356, ecapa_loss=0.0001679, whisper_loss=0.1136, over 16307.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001544, whisper_loss=0.09143, over 3881359.63 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:41:58,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2881420.0, ans=0.125 2024-08-14 22:42:00,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2881420.0, ans=0.09899494936611666 2024-08-14 22:42:06,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2881520.0, ans=0.125 2024-08-14 22:42:18,082 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 22:42:19,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-14 22:42:27,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.348e+01 2.569e+01 2.991e+01 4.323e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-14 22:42:28,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2881620.0, ans=0.125 2024-08-14 22:42:38,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2024-08-14 22:42:41,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2881720.0, ans=0.1 2024-08-14 22:42:45,814 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:42:47,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2881720.0, ans=0.0 2024-08-14 22:42:56,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2881820.0, ans=0.0 2024-08-14 22:42:57,796 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 22:43:07,189 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12850, loss[loss=0.09883, beats_loss=0.009976, ecapa_loss=0.0001427, whisper_loss=0.08743, over 18326.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01084, ecapa_loss=0.0001534, whisper_loss=0.09037, over 3855052.87 frames. ], batch size: 71, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:43:19,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2881920.0, ans=0.0 2024-08-14 22:43:25,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2882020.0, ans=0.09899494936611666 2024-08-14 22:43:29,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2882020.0, ans=0.0 2024-08-14 22:43:37,597 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 22:43:37,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2882120.0, ans=0.2 2024-08-14 22:43:39,114 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 22:43:44,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2882120.0, ans=0.5 2024-08-14 22:43:54,750 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 22:43:56,959 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 22:43:59,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-14 22:44:15,005 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 22:44:15,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2882320.0, ans=0.125 2024-08-14 22:44:26,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12900, loss[loss=0.1052, beats_loss=0.01083, ecapa_loss=0.0001463, whisper_loss=0.09292, over 19518.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01081, ecapa_loss=0.0001546, whisper_loss=0.08979, over 3863058.19 frames. ], batch size: 80, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:44:28,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2882420.0, ans=0.0 2024-08-14 22:44:35,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-08-14 22:44:45,975 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 22:44:57,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2882620.0, ans=0.125 2024-08-14 22:44:58,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-14 22:45:02,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2882620.0, ans=0.2 2024-08-14 22:45:06,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.541e+01 3.103e+01 4.872e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 22:45:11,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2882620.0, ans=15.0 2024-08-14 22:45:17,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2882720.0, ans=0.125 2024-08-14 22:45:19,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2882720.0, ans=0.05 2024-08-14 22:45:20,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2882720.0, ans=0.125 2024-08-14 22:45:20,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-08-14 22:45:44,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2882820.0, ans=0.0 2024-08-14 22:45:46,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 12950, loss[loss=0.129, beats_loss=0.008705, ecapa_loss=0.0001651, whisper_loss=0.1187, over 23849.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.000155, whisper_loss=0.0902, over 3867303.69 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:45:53,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2882920.0, ans=0.0 2024-08-14 22:46:28,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2883120.0, ans=0.125 2024-08-14 22:46:28,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2883120.0, ans=0.125 2024-08-14 22:46:30,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2024-08-14 22:46:47,820 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 22:47:05,143 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13000, loss[loss=0.09829, beats_loss=0.01176, ecapa_loss=0.0001362, whisper_loss=0.08517, over 16933.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001552, whisper_loss=0.09115, over 3890586.69 frames. ], batch size: 66, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:47:11,327 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-14 22:47:14,295 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 22:47:39,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2883620.0, ans=0.125 2024-08-14 22:47:40,490 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 14 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 22:47:44,857 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.265e+01 2.555e+01 2.965e+01 9.531e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-14 22:48:07,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2883820.0, ans=0.125 2024-08-14 22:48:15,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2883820.0, ans=0.2 2024-08-14 22:48:16,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-14 22:48:23,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13050, loss[loss=0.108, beats_loss=0.01046, ecapa_loss=0.0001576, whisper_loss=0.09595, over 22929.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09082, over 3876474.94 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:48:39,694 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 22:48:47,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2884020.0, ans=0.0 2024-08-14 22:49:02,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884120.0, ans=0.1 2024-08-14 22:49:19,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-14 22:49:20,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2884220.0, ans=0.125 2024-08-14 22:49:24,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2884320.0, ans=0.1 2024-08-14 22:49:36,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884320.0, ans=0.1 2024-08-14 22:49:39,073 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13100, loss[loss=0.1209, beats_loss=0.008166, ecapa_loss=0.0001444, whisper_loss=0.1113, over 20029.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001534, whisper_loss=0.09073, over 3871698.68 frames. ], batch size: 76, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:49:49,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.85 vs. limit=10.0 2024-08-14 22:49:54,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2884520.0, ans=0.0 2024-08-14 22:50:09,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-14 22:50:10,301 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-14 22:50:11,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2884620.0, ans=0.0 2024-08-14 22:50:17,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.252e+01 2.466e+01 2.762e+01 3.730e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-14 22:50:19,433 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 22:50:19,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2884620.0, ans=0.2 2024-08-14 22:50:29,898 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 22:50:39,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2884820.0, ans=0.0 2024-08-14 22:50:40,731 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-14 22:50:42,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-14 22:50:45,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-14 22:50:46,483 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 22:50:51,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2884820.0, ans=0.125 2024-08-14 22:50:55,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13150, loss[loss=0.08811, beats_loss=0.01121, ecapa_loss=0.0001315, whisper_loss=0.07559, over 18956.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001521, whisper_loss=0.09113, over 3870090.99 frames. ], batch size: 78, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:50:58,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2884920.0, ans=0.0 2024-08-14 22:51:21,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2885020.0, ans=0.125 2024-08-14 22:51:22,192 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 22:51:30,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2885120.0, ans=0.1 2024-08-14 22:51:36,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2885120.0, ans=0.125 2024-08-14 22:51:39,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885220.0, ans=0.1 2024-08-14 22:51:59,475 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 22:52:08,478 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 22:52:11,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13200, loss[loss=0.1055, beats_loss=0.01019, ecapa_loss=0.0001857, whisper_loss=0.09345, over 16505.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.0911, over 3844784.53 frames. ], batch size: 63, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:52:15,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.46 vs. limit=22.5 2024-08-14 22:52:22,044 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 22:52:42,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2885620.0, ans=0.0 2024-08-14 22:52:42,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2885620.0, ans=0.5 2024-08-14 22:52:45,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2885620.0, ans=0.0 2024-08-14 22:52:49,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.330e+01 2.568e+01 2.844e+01 4.836e+01, threshold=5.136e+01, percent-clipped=0.0 2024-08-14 22:52:50,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=12.0 2024-08-14 22:52:52,841 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 22:52:53,278 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:53:02,067 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 22:53:06,462 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 22:53:06,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2885720.0, ans=0.125 2024-08-14 22:53:15,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-14 22:53:17,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2885820.0, ans=0.125 2024-08-14 22:53:27,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13250, loss[loss=0.1093, beats_loss=0.01121, ecapa_loss=0.0001742, whisper_loss=0.09634, over 22512.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001529, whisper_loss=0.09077, over 3869487.21 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:53:30,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2885920.0, ans=0.125 2024-08-14 22:53:35,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2885920.0, ans=15.0 2024-08-14 22:53:38,622 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 22:53:48,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2886020.0, ans=0.0 2024-08-14 22:53:50,714 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 22:53:55,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2886020.0, ans=0.0 2024-08-14 22:54:08,557 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-14 22:54:10,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2886120.0, ans=0.09899494936611666 2024-08-14 22:54:26,725 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.710e-01 2024-08-14 22:54:34,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-14 22:54:37,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2886320.0, ans=0.125 2024-08-14 22:54:42,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13300, loss[loss=0.09042, beats_loss=0.01182, ecapa_loss=0.0001244, whisper_loss=0.07735, over 15745.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001516, whisper_loss=0.09064, over 3860302.90 frames. ], batch size: 60, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:54:49,103 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 22:55:00,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2886520.0, ans=22.5 2024-08-14 22:55:06,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-14 22:55:20,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.319e+01 2.603e+01 2.951e+01 5.075e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 22:55:34,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=12.0 2024-08-14 22:55:58,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13350, loss[loss=0.1322, beats_loss=0.008199, ecapa_loss=0.0001183, whisper_loss=0.1228, over 24872.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001518, whisper_loss=0.09113, over 3852322.52 frames. ], batch size: 91, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:56:51,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2887220.0, ans=0.0 2024-08-14 22:57:03,249 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 22:57:13,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13400, loss[loss=0.1067, beats_loss=0.01012, ecapa_loss=0.0001762, whisper_loss=0.09486, over 22442.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001524, whisper_loss=0.09132, over 3877483.18 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:57:24,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-14 22:57:36,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2887520.0, ans=0.125 2024-08-14 22:57:49,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2887620.0, ans=0.125 2024-08-14 22:57:50,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-14 22:57:50,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.321e+01 2.514e+01 2.715e+01 4.037e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-14 22:58:28,180 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13450, loss[loss=0.1166, beats_loss=0.01013, ecapa_loss=0.0001559, whisper_loss=0.1049, over 21899.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0106, ecapa_loss=0.0001532, whisper_loss=0.09167, over 3893146.67 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:58:34,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2887920.0, ans=0.125 2024-08-14 22:58:38,469 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-14 22:58:51,751 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 22:59:15,062 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 22:59:16,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2888220.0, ans=0.0 2024-08-14 22:59:24,671 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 22:59:31,849 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 22:59:32,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2888320.0, ans=0.0 2024-08-14 22:59:33,321 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 22:59:40,565 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13500, loss[loss=0.1052, beats_loss=0.00968, ecapa_loss=0.0001617, whisper_loss=0.09392, over 21557.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01055, ecapa_loss=0.0001542, whisper_loss=0.09153, over 3919436.33 frames. ], batch size: 84, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:59:52,106 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 22:59:59,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888520.0, ans=0.125 2024-08-14 23:00:04,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.10 vs. limit=10.0 2024-08-14 23:00:18,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.433e+01 2.626e+01 2.844e+01 4.134e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 23:00:19,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888620.0, ans=0.1 2024-08-14 23:00:39,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2888720.0, ans=0.125 2024-08-14 23:00:46,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2888820.0, ans=0.1 2024-08-14 23:00:46,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2888820.0, ans=0.1 2024-08-14 23:00:56,338 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13550, loss[loss=0.04635, beats_loss=0.01501, ecapa_loss=0.0001061, whisper_loss=0.03028, over 13803.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001532, whisper_loss=0.09078, over 3893295.97 frames. ], batch size: 56, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:01:12,837 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 23:01:16,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-14 23:01:20,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2889020.0, ans=0.1 2024-08-14 23:01:25,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2889120.0, ans=0.125 2024-08-14 23:01:29,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889120.0, ans=0.1 2024-08-14 23:01:32,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.91 vs. limit=10.0 2024-08-14 23:01:34,771 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 23:01:41,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2889220.0, ans=0.0 2024-08-14 23:01:58,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-14 23:02:09,303 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13600, loss[loss=0.1165, beats_loss=0.01006, ecapa_loss=0.0001515, whisper_loss=0.1049, over 21337.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001532, whisper_loss=0.09135, over 3882523.94 frames. ], batch size: 83, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:02:21,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2889420.0, ans=0.125 2024-08-14 23:02:25,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2024-08-14 23:02:41,250 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 23:02:43,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889620.0, ans=0.1 2024-08-14 23:02:45,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.266e+01 2.507e+01 2.843e+01 1.129e+02, threshold=5.014e+01, percent-clipped=1.0 2024-08-14 23:03:01,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2889720.0, ans=0.125 2024-08-14 23:03:14,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2889820.0, ans=0.125 2024-08-14 23:03:22,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13650, loss[loss=0.1149, beats_loss=0.01207, ecapa_loss=0.000141, whisper_loss=0.1014, over 19966.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001531, whisper_loss=0.09104, over 3851564.71 frames. ], batch size: 79, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:03:29,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2889920.0, ans=0.125 2024-08-14 23:03:35,357 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 23:03:47,737 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.580e-02 2024-08-14 23:03:53,106 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 23:04:00,556 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 23:04:22,772 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 16 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 23:04:37,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13700, loss[loss=0.08703, beats_loss=0.01046, ecapa_loss=0.0001489, whisper_loss=0.07509, over 18502.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001529, whisper_loss=0.0908, over 3846250.16 frames. ], batch size: 76, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:04:42,853 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 23:04:56,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2890520.0, ans=0.125 2024-08-14 23:04:57,665 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 23:05:15,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.409e+01 2.626e+01 3.025e+01 7.983e+01, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 23:05:27,650 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 24 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-14 23:05:36,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2890820.0, ans=0.125 2024-08-14 23:05:52,622 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13750, loss[loss=0.08627, beats_loss=0.01236, ecapa_loss=0.0001264, whisper_loss=0.07265, over 20451.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001528, whisper_loss=0.09084, over 3873616.64 frames. ], batch size: 83, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:05:53,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2890920.0, ans=0.05 2024-08-14 23:06:10,250 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.971e+00 2024-08-14 23:06:14,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2891020.0, ans=0.125 2024-08-14 23:06:14,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2024-08-14 23:06:21,720 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 23:06:21,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2891120.0, ans=0.125 2024-08-14 23:06:40,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2891220.0, ans=0.025 2024-08-14 23:06:41,388 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-14 23:06:58,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2024-08-14 23:07:07,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13800, loss[loss=0.12, beats_loss=0.009665, ecapa_loss=0.000152, whisper_loss=0.1088, over 22537.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001519, whisper_loss=0.09068, over 3874683.11 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:07:09,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2891420.0, ans=0.0 2024-08-14 23:07:12,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-14 23:07:28,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=12.0 2024-08-14 23:07:42,595 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 23:07:46,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.261e+01 2.530e+01 3.041e+01 4.646e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 23:08:09,573 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 23:08:16,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2891820.0, ans=0.2 2024-08-14 23:08:22,811 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13850, loss[loss=0.1128, beats_loss=0.00908, ecapa_loss=0.0001814, whisper_loss=0.1019, over 13858.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001521, whisper_loss=0.09073, over 3885611.61 frames. ], batch size: 57, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:08:28,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-14 23:08:29,062 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 23:09:14,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2892220.0, ans=0.1 2024-08-14 23:09:23,836 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 23:09:40,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13900, loss[loss=0.0873, beats_loss=0.01254, ecapa_loss=0.00014, whisper_loss=0.07337, over 21430.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001518, whisper_loss=0.09059, over 3875981.10 frames. ], batch size: 87, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:09:48,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2892420.0, ans=0.1 2024-08-14 23:10:02,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2892520.0, ans=0.125 2024-08-14 23:10:09,229 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 23:10:12,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2892620.0, ans=0.0 2024-08-14 23:10:13,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-14 23:10:20,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.401e+01 2.644e+01 3.017e+01 4.177e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-14 23:10:20,321 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 23:10:25,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2892720.0, ans=0.125 2024-08-14 23:10:41,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=22.5 2024-08-14 23:10:42,230 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 23:10:56,040 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 13950, loss[loss=0.08905, beats_loss=0.01263, ecapa_loss=0.0001318, whisper_loss=0.0751, over 18072.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.0001519, whisper_loss=0.09131, over 3883223.76 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:10:59,195 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 23:11:09,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2893020.0, ans=0.125 2024-08-14 23:11:18,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2024-08-14 23:11:32,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2893120.0, ans=0.125 2024-08-14 23:11:35,545 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 23:11:56,619 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 23:11:59,951 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 23:12:11,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14000, loss[loss=0.1084, beats_loss=0.01153, ecapa_loss=0.0001279, whisper_loss=0.09558, over 22883.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001512, whisper_loss=0.09085, over 3873430.21 frames. ], batch size: 88, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:12:20,754 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 23:12:24,054 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 23:12:36,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-14 23:12:37,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2893520.0, ans=0.0 2024-08-14 23:12:41,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2893620.0, ans=0.125 2024-08-14 23:12:42,328 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-14 23:12:50,921 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.312e+01 2.545e+01 2.868e+01 4.909e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-14 23:12:55,748 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 23:13:09,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2893720.0, ans=0.125 2024-08-14 23:13:13,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2893820.0, ans=0.125 2024-08-14 23:13:23,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2893820.0, ans=0.0 2024-08-14 23:13:27,434 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14050, loss[loss=0.1321, beats_loss=0.008798, ecapa_loss=0.0001097, whisper_loss=0.1222, over 20599.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001504, whisper_loss=0.09096, over 3864210.55 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:13:30,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2893920.0, ans=0.025 2024-08-14 23:13:48,536 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 23:13:48,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2894020.0, ans=0.0 2024-08-14 23:14:11,555 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.373e+01 2024-08-14 23:14:18,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2894220.0, ans=0.2 2024-08-14 23:14:33,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2894320.0, ans=0.125 2024-08-14 23:14:38,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2894320.0, ans=0.0 2024-08-14 23:14:38,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2894320.0, ans=0.125 2024-08-14 23:14:42,167 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14100, loss[loss=0.1344, beats_loss=0.008687, ecapa_loss=0.0001736, whisper_loss=0.124, over 21954.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001507, whisper_loss=0.09106, over 3870998.74 frames. ], batch size: 86, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:14:44,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2894420.0, ans=0.0 2024-08-14 23:14:46,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2894420.0, ans=0.5 2024-08-14 23:14:48,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2894420.0, ans=0.125 2024-08-14 23:14:50,063 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 23:15:08,182 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 35 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 23:15:11,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2894620.0, ans=0.125 2024-08-14 23:15:17,536 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 23:15:21,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.405e+01 2.712e+01 3.125e+01 2.483e+02, threshold=5.424e+01, percent-clipped=1.0 2024-08-14 23:15:21,728 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 23:15:22,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2894620.0, ans=0.2 2024-08-14 23:15:23,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2894620.0, ans=0.125 2024-08-14 23:15:26,136 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 23:15:31,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=12.0 2024-08-14 23:15:42,539 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 23:15:57,329 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14150, loss[loss=0.07871, beats_loss=0.009679, ecapa_loss=0.0001333, whisper_loss=0.0677, over 17180.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.0001513, whisper_loss=0.09037, over 3900676.20 frames. ], batch size: 65, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:16:04,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2894920.0, ans=10.0 2024-08-14 23:16:07,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2894920.0, ans=0.1 2024-08-14 23:16:12,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-08-14 23:16:28,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2895120.0, ans=0.125 2024-08-14 23:16:55,032 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 23:16:58,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2895320.0, ans=0.0 2024-08-14 23:17:00,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2024-08-14 23:17:10,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2895320.0, ans=0.1 2024-08-14 23:17:12,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14200, loss[loss=0.1104, beats_loss=0.008892, ecapa_loss=0.0002197, whisper_loss=0.09928, over 22662.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01083, ecapa_loss=0.0001522, whisper_loss=0.08973, over 3890044.35 frames. ], batch size: 94, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:17:25,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2895420.0, ans=0.0 2024-08-14 23:17:25,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2895420.0, ans=0.125 2024-08-14 23:17:31,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2895520.0, ans=0.125 2024-08-14 23:17:46,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.76 vs. limit=10.0 2024-08-14 23:17:52,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.327e+01 2.651e+01 3.057e+01 2.487e+02, threshold=5.302e+01, percent-clipped=2.0 2024-08-14 23:17:53,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2895620.0, ans=0.125 2024-08-14 23:18:25,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-08-14 23:18:28,481 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14250, loss[loss=0.08622, beats_loss=0.01303, ecapa_loss=0.000123, whisper_loss=0.07196, over 23040.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001518, whisper_loss=0.09023, over 3885054.24 frames. ], batch size: 91, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:19:37,835 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 23:19:41,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-14 23:19:45,071 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14300, loss[loss=0.1057, beats_loss=0.01006, ecapa_loss=0.0001896, whisper_loss=0.09372, over 13524.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001528, whisper_loss=0.08993, over 3909716.08 frames. ], batch size: 56, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:19:59,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2896520.0, ans=0.125 2024-08-14 23:20:03,129 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 12 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 23:20:03,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2896520.0, ans=0.125 2024-08-14 23:20:03,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2896520.0, ans=0.0 2024-08-14 23:20:03,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2024-08-14 23:20:04,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2896520.0, ans=0.0 2024-08-14 23:20:23,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.328e+01 2.532e+01 2.890e+01 9.959e+01, threshold=5.063e+01, percent-clipped=3.0 2024-08-14 23:20:30,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.35 vs. limit=6.0 2024-08-14 23:20:32,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2896720.0, ans=0.0 2024-08-14 23:20:33,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2896720.0, ans=10.0 2024-08-14 23:20:45,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2024-08-14 23:20:52,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2896820.0, ans=0.0 2024-08-14 23:20:53,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2896820.0, ans=0.2 2024-08-14 23:20:58,739 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14350, loss[loss=0.1072, beats_loss=0.01014, ecapa_loss=0.0001865, whisper_loss=0.09516, over 22025.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001545, whisper_loss=0.0902, over 3927791.35 frames. ], batch size: 93, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:20:59,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2896920.0, ans=0.1 2024-08-14 23:20:59,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-14 23:21:07,737 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 20 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 23:21:09,108 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 23:21:10,536 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 23:21:12,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2897020.0, ans=0.0 2024-08-14 23:21:20,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2024-08-14 23:21:56,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2897320.0, ans=0.125 2024-08-14 23:22:00,788 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:22:04,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-14 23:22:05,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2897320.0, ans=0.025 2024-08-14 23:22:11,664 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14400, loss[loss=0.1164, beats_loss=0.00855, ecapa_loss=0.000142, whisper_loss=0.1064, over 16245.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001539, whisper_loss=0.09059, over 3943079.93 frames. ], batch size: 61, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:22:23,783 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 23:22:32,095 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 23:22:44,127 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 23:22:51,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.446e+01 2.693e+01 3.071e+01 5.241e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-14 23:23:21,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2897820.0, ans=0.05 2024-08-14 23:23:26,816 INFO [train_multi_KD3.py:1116] (2/4) Epoch 20, batch 14450, loss[loss=0.1041, beats_loss=0.01209, ecapa_loss=0.0001454, whisper_loss=0.09055, over 22562.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001537, whisper_loss=0.09013, over 3898811.62 frames. ], batch size: 88, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:23:42,397 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-14 23:24:08,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2898120.0, ans=0.0 2024-08-14 23:25:02,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 0, loss[loss=0.09628, beats_loss=0.009977, ecapa_loss=0.0001786, whisper_loss=0.08451, over 21849.00 frames. ], tot_loss[loss=0.09628, beats_loss=0.009977, ecapa_loss=0.0001786, whisper_loss=0.08451, over 21849.00 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:25:02,524 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-14 23:25:46,191 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005489, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 23:26:02,308 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on SV_voxceleb1: loss=0.004256, beats_loss=0, ecapa_loss=0.0004256, whisper_loss=0, over 939242.00 frames. 2024-08-14 23:28:02,905 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on AT_audioset: loss=0.02343, beats_loss=0.02343, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 23:28:02,908 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-14 23:28:04,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2898350.0, ans=0.02 2024-08-14 23:28:46,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2898450.0, ans=0.125 2024-08-14 23:29:07,729 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 23:29:12,228 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 23:29:22,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2898650.0, ans=0.125 2024-08-14 23:29:30,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.477e+01 2.723e+01 3.011e+01 4.734e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 23:29:41,213 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 23:29:46,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2024-08-14 23:29:51,190 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 23:30:07,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=22.5 2024-08-14 23:30:13,508 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 50, loss[loss=0.1003, beats_loss=0.009287, ecapa_loss=0.0001469, whisper_loss=0.08958, over 18491.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01001, ecapa_loss=0.0001555, whisper_loss=0.08922, over 915333.37 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:30:51,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2898950.0, ans=0.125 2024-08-14 23:30:53,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2898950.0, ans=0.125 2024-08-14 23:31:19,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2899050.0, ans=0.2 2024-08-14 23:31:36,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.54 vs. limit=22.5 2024-08-14 23:32:13,752 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 100, loss[loss=0.1017, beats_loss=0.008777, ecapa_loss=0.0001997, whisper_loss=0.09091, over 13485.00 frames. ], tot_loss[loss=0.09896, beats_loss=0.009695, ecapa_loss=0.0001571, whisper_loss=0.08769, over 1553564.93 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:32:27,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2899350.0, ans=0.0 2024-08-14 23:32:43,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2899450.0, ans=0.125 2024-08-14 23:32:55,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2899450.0, ans=0.125 2024-08-14 23:32:55,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2899450.0, ans=0.125 2024-08-14 23:33:05,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2899550.0, ans=0.0 2024-08-14 23:33:07,574 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 23:33:10,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2899550.0, ans=0.125 2024-08-14 23:33:31,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.639e+01 2.885e+01 3.247e+01 3.567e+02, threshold=5.770e+01, percent-clipped=1.0 2024-08-14 23:33:35,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2899650.0, ans=0.125 2024-08-14 23:33:50,685 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 23:33:59,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2899750.0, ans=0.2 2024-08-14 23:34:07,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 150, loss[loss=0.08888, beats_loss=0.007386, ecapa_loss=0.0001418, whisper_loss=0.08007, over 19355.00 frames. ], tot_loss[loss=0.09914, beats_loss=0.009734, ecapa_loss=0.0001549, whisper_loss=0.08786, over 2064912.95 frames. ], batch size: 68, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:34:17,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2899850.0, ans=0.1 2024-08-14 23:34:17,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2899850.0, ans=0.2 2024-08-14 23:34:25,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2899950.0, ans=0.1 2024-08-14 23:34:30,923 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 23:34:31,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2899950.0, ans=0.125 2024-08-14 23:34:46,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900050.0, ans=0.1 2024-08-14 23:34:50,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=2900050.0, ans=0.1 2024-08-14 23:35:16,639 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 23:35:20,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900250.0, ans=0.1 2024-08-14 23:35:29,101 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 23:35:31,923 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 200, loss[loss=0.09792, beats_loss=0.01269, ecapa_loss=0.0001651, whisper_loss=0.08358, over 22376.00 frames. ], tot_loss[loss=0.1, beats_loss=0.009841, ecapa_loss=0.000155, whisper_loss=0.0886, over 2420629.78 frames. ], batch size: 95, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:35:34,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2900350.0, ans=0.0 2024-08-14 23:35:57,954 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 23:36:19,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2024-08-14 23:36:20,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2900650.0, ans=0.125 2024-08-14 23:36:23,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.304e+01 2.562e+01 2.943e+01 6.143e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 23:36:49,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 250, loss[loss=0.1074, beats_loss=0.009542, ecapa_loss=0.0001178, whisper_loss=0.09668, over 19967.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009967, ecapa_loss=0.0001549, whisper_loss=0.09, over 2703893.85 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:37:00,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2900850.0, ans=10.0 2024-08-14 23:37:13,165 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 23:37:16,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2900950.0, ans=0.0 2024-08-14 23:37:42,028 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 23:37:55,062 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 23:38:01,927 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 300, loss[loss=0.07101, beats_loss=0.01406, ecapa_loss=0.0001251, whisper_loss=0.0557, over 17268.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01029, ecapa_loss=0.000154, whisper_loss=0.0893, over 2949241.54 frames. ], batch size: 68, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:38:08,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2901350.0, ans=0.0 2024-08-14 23:38:14,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2901350.0, ans=0.0 2024-08-14 23:38:14,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-08-14 23:38:15,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2901450.0, ans=0.0 2024-08-14 23:38:25,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2901450.0, ans=0.0 2024-08-14 23:38:49,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.216e+01 2.512e+01 2.821e+01 4.988e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-14 23:39:13,193 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 350, loss[loss=0.08656, beats_loss=0.01139, ecapa_loss=0.0001654, whisper_loss=0.07351, over 21708.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0103, ecapa_loss=0.0001542, whisper_loss=0.08916, over 3097062.85 frames. ], batch size: 92, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:39:43,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2902050.0, ans=0.125 2024-08-14 23:39:57,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2902150.0, ans=0.125 2024-08-14 23:40:05,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2902150.0, ans=0.2 2024-08-14 23:40:10,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2902150.0, ans=0.2 2024-08-14 23:40:10,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-08-14 23:40:13,201 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 23:40:14,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2902250.0, ans=0.125 2024-08-14 23:40:27,222 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 400, loss[loss=0.1008, beats_loss=0.01121, ecapa_loss=0.0001577, whisper_loss=0.088, over 18591.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.0001523, whisper_loss=0.08879, over 3287405.10 frames. ], batch size: 71, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:41:07,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2902550.0, ans=0.125 2024-08-14 23:41:07,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2902550.0, ans=0.125 2024-08-14 23:41:18,395 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.395e+01 2.699e+01 3.154e+01 2.910e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-14 23:41:19,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2902650.0, ans=0.0 2024-08-14 23:41:20,164 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:41:22,860 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:41:24,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-14 23:41:28,484 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 23:41:29,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.41 vs. limit=10.0 2024-08-14 23:41:34,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2902750.0, ans=0.125 2024-08-14 23:41:43,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2902750.0, ans=0.05 2024-08-14 23:41:44,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2902850.0, ans=0.125 2024-08-14 23:41:45,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 450, loss[loss=0.1121, beats_loss=0.009691, ecapa_loss=0.0001831, whisper_loss=0.1006, over 21815.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001529, whisper_loss=0.08896, over 3394162.27 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:41:51,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-14 23:42:00,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-14 23:42:23,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2903050.0, ans=22.5 2024-08-14 23:42:36,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2903150.0, ans=0.07 2024-08-14 23:43:04,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 500, loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001649, whisper_loss=0.09172, over 15914.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001533, whisper_loss=0.08907, over 3503747.45 frames. ], batch size: 63, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:43:14,293 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 23:43:20,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2903450.0, ans=0.125 2024-08-14 23:43:29,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2903450.0, ans=0.1 2024-08-14 23:43:33,693 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 23:43:43,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2903550.0, ans=0.125 2024-08-14 23:43:46,113 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 23:43:55,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.351e+01 2.577e+01 2.922e+01 3.343e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-14 23:43:55,795 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 23:43:59,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2903650.0, ans=0.125 2024-08-14 23:44:02,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2903650.0, ans=0.125 2024-08-14 23:44:08,702 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 23:44:14,658 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 23:44:18,642 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 550, loss[loss=0.0707, beats_loss=0.01313, ecapa_loss=0.0001216, whisper_loss=0.05636, over 17751.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001516, whisper_loss=0.08906, over 3583856.64 frames. ], batch size: 69, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:44:39,252 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 23:44:52,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-14 23:44:55,423 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.885e+01 2024-08-14 23:44:56,418 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 23:45:02,889 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 23:45:04,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2904150.0, ans=0.125 2024-08-14 23:45:17,078 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 23:45:22,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-08-14 23:45:24,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 600, loss[loss=0.09316, beats_loss=0.008376, ecapa_loss=0.0001896, whisper_loss=0.08289, over 17061.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001509, whisper_loss=0.09015, over 3634677.34 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:45:27,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2904350.0, ans=0.125 2024-08-14 23:45:33,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-14 23:45:34,073 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 23:45:43,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2904450.0, ans=0.035 2024-08-14 23:45:48,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2904450.0, ans=0.07 2024-08-14 23:45:54,484 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-14 23:46:08,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.318e+01 2.599e+01 2.895e+01 9.632e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-14 23:46:13,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2904650.0, ans=0.07 2024-08-14 23:46:21,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=12.0 2024-08-14 23:46:26,087 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 23:46:29,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 650, loss[loss=0.115, beats_loss=0.01006, ecapa_loss=0.0001361, whisper_loss=0.1036, over 14925.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01027, ecapa_loss=0.0001511, whisper_loss=0.09067, over 3667374.29 frames. ], batch size: 57, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:46:37,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2904850.0, ans=0.125 2024-08-14 23:47:02,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2905050.0, ans=0.0 2024-08-14 23:47:04,320 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 23:47:32,550 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 23:47:34,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2905250.0, ans=0.1 2024-08-14 23:47:36,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-08-14 23:47:36,380 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 700, loss[loss=0.1041, beats_loss=0.01061, ecapa_loss=0.0001675, whisper_loss=0.09182, over 15226.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.000152, whisper_loss=0.09073, over 3686770.61 frames. ], batch size: 63, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:47:38,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2905350.0, ans=0.125 2024-08-14 23:47:41,683 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 23:47:52,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2905450.0, ans=0.125 2024-08-14 23:48:00,004 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 23:48:21,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.342e+01 2.517e+01 2.889e+01 6.845e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-14 23:48:25,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=22.5 2024-08-14 23:48:37,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-14 23:48:41,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 750, loss[loss=0.1195, beats_loss=0.008784, ecapa_loss=0.0001396, whisper_loss=0.1093, over 17571.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001512, whisper_loss=0.09, over 3728213.21 frames. ], batch size: 65, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:48:46,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2905850.0, ans=0.125 2024-08-14 23:49:13,295 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 23:49:29,035 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-14 23:49:46,110 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-14 23:49:47,267 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 800, loss[loss=0.09914, beats_loss=0.01168, ecapa_loss=0.0001041, whisper_loss=0.08642, over 18386.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001506, whisper_loss=0.08951, over 3754635.33 frames. ], batch size: 66, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:49:52,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2906350.0, ans=0.125 2024-08-14 23:50:25,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2906650.0, ans=0.1 2024-08-14 23:50:31,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.222e+01 2.469e+01 2.749e+01 4.131e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-14 23:50:36,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2906650.0, ans=0.09899494936611666 2024-08-14 23:50:44,127 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.122e-02 2024-08-14 23:50:52,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 850, loss[loss=0.1, beats_loss=0.01167, ecapa_loss=0.0001411, whisper_loss=0.08691, over 21654.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001515, whisper_loss=0.08981, over 3768836.51 frames. ], batch size: 87, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:51:03,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2024-08-14 23:51:20,407 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 23:51:27,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2907050.0, ans=0.125 2024-08-14 23:51:28,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2907050.0, ans=0.0 2024-08-14 23:51:28,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2907050.0, ans=0.0 2024-08-14 23:51:44,504 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 23:51:47,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2907250.0, ans=0.0 2024-08-14 23:51:58,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-14 23:51:58,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 900, loss[loss=0.1054, beats_loss=0.0103, ecapa_loss=0.0001365, whisper_loss=0.09373, over 16740.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001507, whisper_loss=0.08921, over 3778269.40 frames. ], batch size: 63, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:52:07,975 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 23:52:11,899 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 23:52:12,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907450.0, ans=0.1 2024-08-14 23:52:29,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2907550.0, ans=0.0 2024-08-14 23:52:43,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.513e+01 2.774e+01 3.197e+01 6.969e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 23:53:05,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 950, loss[loss=0.108, beats_loss=0.00972, ecapa_loss=0.0001704, whisper_loss=0.0966, over 18584.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001509, whisper_loss=0.08975, over 3802575.61 frames. ], batch size: 75, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:53:06,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2907850.0, ans=0.125 2024-08-14 23:53:11,540 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-14 23:53:20,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2907950.0, ans=0.0 2024-08-14 23:53:23,111 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 23:53:37,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2908050.0, ans=0.125 2024-08-14 23:53:46,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2908150.0, ans=0.0 2024-08-14 23:53:50,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-14 23:53:52,228 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 23:53:59,702 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 23:54:06,207 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-08-14 23:54:10,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1000, loss[loss=0.09658, beats_loss=0.00821, ecapa_loss=0.0001402, whisper_loss=0.08697, over 15676.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001502, whisper_loss=0.08967, over 3795990.22 frames. ], batch size: 57, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:54:14,731 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 23:54:17,294 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 23:54:55,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.358e+01 2.563e+01 2.906e+01 4.216e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-14 23:55:15,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1050, loss[loss=0.1113, beats_loss=0.01103, ecapa_loss=0.0001218, whisper_loss=0.09906, over 21374.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001505, whisper_loss=0.08957, over 3821449.65 frames. ], batch size: 80, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:55:17,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-08-14 23:55:23,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2908850.0, ans=0.0 2024-08-14 23:55:25,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2908850.0, ans=0.125 2024-08-14 23:55:30,891 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.833e-02 2024-08-14 23:55:33,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2908950.0, ans=0.0 2024-08-14 23:55:43,130 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 23:55:50,455 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 26 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-14 23:55:51,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2909050.0, ans=0.125 2024-08-14 23:55:58,168 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 23:56:00,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2909150.0, ans=0.125 2024-08-14 23:56:12,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=12.0 2024-08-14 23:56:21,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1100, loss[loss=0.1091, beats_loss=0.008282, ecapa_loss=0.0001469, whisper_loss=0.0993, over 16832.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.08994, over 3834425.09 frames. ], batch size: 66, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:56:26,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2909350.0, ans=0.05 2024-08-14 23:56:45,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-14 23:56:47,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2909550.0, ans=0.1 2024-08-14 23:56:50,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2024-08-14 23:56:53,977 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 23:57:04,392 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 23:57:05,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.324e+01 2.535e+01 2.853e+01 4.555e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 23:57:24,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2909750.0, ans=0.0 2024-08-14 23:57:26,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1150, loss[loss=0.1196, beats_loss=0.009565, ecapa_loss=0.000165, whisper_loss=0.1083, over 22711.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.00015, whisper_loss=0.08986, over 3839385.78 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:57:46,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2909950.0, ans=0.0 2024-08-14 23:57:51,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2910050.0, ans=0.2 2024-08-14 23:58:08,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2910150.0, ans=0.125 2024-08-14 23:58:09,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2024-08-14 23:58:18,798 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08089441806077957, model_norm_threshold=50.69233322143555 2024-08-14 23:58:18,971 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.506e+05, grad_sumsq=1.506e+05, orig_rms_sq=1.000e+00 2024-08-14 23:58:32,262 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1200, loss[loss=0.0918, beats_loss=0.009179, ecapa_loss=0.0001701, whisper_loss=0.08092, over 14473.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001506, whisper_loss=0.09001, over 3859914.31 frames. ], batch size: 59, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:58:38,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2910350.0, ans=0.0 2024-08-14 23:58:43,421 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-14 23:58:43,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2910350.0, ans=0.2 2024-08-14 23:59:01,759 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 23:59:03,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2910550.0, ans=0.2 2024-08-14 23:59:04,097 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 23:59:04,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910550.0, ans=0.1 2024-08-14 23:59:16,072 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 23:59:18,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=12.0 2024-08-14 23:59:18,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.273e+01 2.471e+01 2.921e+01 6.266e+02, threshold=4.943e+01, percent-clipped=3.0 2024-08-14 23:59:31,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-14 23:59:39,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1250, loss[loss=0.09461, beats_loss=0.01046, ecapa_loss=0.0001704, whisper_loss=0.08244, over 13937.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001507, whisper_loss=0.08988, over 3869800.90 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:59:46,596 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 23:59:46,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2910850.0, ans=0.125 2024-08-15 00:00:02,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2024-08-15 00:00:10,977 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 00:00:12,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2911050.0, ans=0.2 2024-08-15 00:00:15,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2911050.0, ans=0.1 2024-08-15 00:00:50,007 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 00:00:51,596 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1300, loss[loss=0.09315, beats_loss=0.0114, ecapa_loss=0.0001535, whisper_loss=0.08021, over 20394.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.0001497, whisper_loss=0.08943, over 3857990.42 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:01:00,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2911350.0, ans=0.125 2024-08-15 00:01:02,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2911350.0, ans=0.1 2024-08-15 00:01:06,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-15 00:01:17,401 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 00:01:18,587 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 00:01:40,984 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.220e+01 2.514e+01 2.820e+01 5.708e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-15 00:01:59,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2911750.0, ans=0.125 2024-08-15 00:02:02,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2911750.0, ans=0.07 2024-08-15 00:02:07,111 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1350, loss[loss=0.1005, beats_loss=0.008765, ecapa_loss=0.0001455, whisper_loss=0.09025, over 18370.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001495, whisper_loss=0.08933, over 3829167.11 frames. ], batch size: 68, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:02:36,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2911950.0, ans=0.0 2024-08-15 00:02:46,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912050.0, ans=0.1 2024-08-15 00:02:50,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2912050.0, ans=0.0 2024-08-15 00:03:06,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2912150.0, ans=0.2 2024-08-15 00:03:26,172 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1400, loss[loss=0.08354, beats_loss=0.01293, ecapa_loss=0.0001192, whisper_loss=0.06942, over 20988.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01077, ecapa_loss=0.0001479, whisper_loss=0.0883, over 3824021.35 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:03:28,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2912350.0, ans=0.0 2024-08-15 00:03:39,137 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 00:03:40,784 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:03:56,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2912550.0, ans=0.125 2024-08-15 00:04:18,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.213e+01 2.409e+01 2.857e+01 4.258e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-15 00:04:34,523 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 00:05:00,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1450, loss[loss=0.09933, beats_loss=0.01038, ecapa_loss=0.0001678, whisper_loss=0.08727, over 17820.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0107, ecapa_loss=0.000148, whisper_loss=0.08855, over 3825342.04 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:05:02,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2912850.0, ans=0.0 2024-08-15 00:05:06,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2912850.0, ans=0.125 2024-08-15 00:05:08,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2912850.0, ans=0.0 2024-08-15 00:05:09,572 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 00:05:24,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912950.0, ans=0.1 2024-08-15 00:05:40,366 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 00:05:51,689 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 00:05:51,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2913150.0, ans=0.0 2024-08-15 00:05:53,032 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 00:06:13,529 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 00:06:17,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2913250.0, ans=0.025 2024-08-15 00:06:25,525 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1500, loss[loss=0.116, beats_loss=0.01069, ecapa_loss=0.0001595, whisper_loss=0.1037, over 22511.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01071, ecapa_loss=0.0001479, whisper_loss=0.08877, over 3837061.54 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:06:29,572 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 00:07:03,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2913550.0, ans=0.125 2024-08-15 00:07:20,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.340e+01 2.607e+01 2.901e+01 2.472e+02, threshold=5.215e+01, percent-clipped=2.0 2024-08-15 00:07:42,396 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 00:07:43,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.97 vs. limit=8.0 2024-08-15 00:07:44,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2913750.0, ans=0.0 2024-08-15 00:07:54,049 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1550, loss[loss=0.1216, beats_loss=0.008945, ecapa_loss=0.0001443, whisper_loss=0.1112, over 15718.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.0894, over 3828809.45 frames. ], batch size: 59, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:08:18,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2913950.0, ans=0.0 2024-08-15 00:08:35,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2914050.0, ans=0.0 2024-08-15 00:08:49,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2914050.0, ans=0.125 2024-08-15 00:09:00,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2914150.0, ans=0.0 2024-08-15 00:09:36,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1600, loss[loss=0.08488, beats_loss=0.01321, ecapa_loss=0.0001166, whisper_loss=0.07051, over 21544.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01068, ecapa_loss=0.0001472, whisper_loss=0.08936, over 3810037.25 frames. ], batch size: 85, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:09:45,272 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 00:09:50,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2914350.0, ans=0.2 2024-08-15 00:10:18,856 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 00:10:56,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-15 00:10:57,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.368e+01 2.559e+01 2.879e+01 3.704e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 00:11:01,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2024-08-15 00:11:20,298 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 00:11:25,946 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 00:11:36,677 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1650, loss[loss=0.09241, beats_loss=0.009465, ecapa_loss=0.0001526, whisper_loss=0.08142, over 17468.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01071, ecapa_loss=0.0001468, whisper_loss=0.08921, over 3807118.45 frames. ], batch size: 68, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:11:43,110 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 00:11:45,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2914850.0, ans=0.125 2024-08-15 00:11:46,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2914850.0, ans=0.07 2024-08-15 00:12:32,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2915050.0, ans=0.1 2024-08-15 00:13:24,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2915250.0, ans=0.125 2024-08-15 00:13:33,645 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 00:13:33,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2915350.0, ans=0.07 2024-08-15 00:13:36,042 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1700, loss[loss=0.1067, beats_loss=0.009972, ecapa_loss=0.0001378, whisper_loss=0.09535, over 17602.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.0001468, whisper_loss=0.08948, over 3835336.79 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:14:11,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2915450.0, ans=0.0 2024-08-15 00:14:24,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2915550.0, ans=0.2 2024-08-15 00:14:25,412 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-15 00:14:28,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=10.0 2024-08-15 00:14:32,459 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 00:14:48,333 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 00:14:55,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.594e+01 2.900e+01 5.252e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-15 00:15:10,944 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 00:15:18,104 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 00:15:22,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2915750.0, ans=0.0 2024-08-15 00:15:30,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1750, loss[loss=0.1125, beats_loss=0.009688, ecapa_loss=0.0001574, whisper_loss=0.1012, over 22780.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.0001476, whisper_loss=0.08949, over 3841696.40 frames. ], batch size: 90, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:15:33,604 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 00:15:39,821 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 00:15:49,744 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 9 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 00:16:00,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2915950.0, ans=0.09899494936611666 2024-08-15 00:16:00,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2915950.0, ans=0.0 2024-08-15 00:16:00,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-15 00:16:03,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2916050.0, ans=0.1 2024-08-15 00:16:09,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2916050.0, ans=0.125 2024-08-15 00:16:16,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2916150.0, ans=0.1 2024-08-15 00:16:24,528 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 00:16:29,790 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 00:16:35,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2916250.0, ans=0.125 2024-08-15 00:16:36,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2916250.0, ans=0.1 2024-08-15 00:16:42,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1800, loss[loss=0.1229, beats_loss=0.009129, ecapa_loss=0.0001265, whisper_loss=0.1125, over 16582.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01069, ecapa_loss=0.0001475, whisper_loss=0.08929, over 3828196.42 frames. ], batch size: 62, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:16:49,906 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 00:16:50,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2916350.0, ans=0.0 2024-08-15 00:16:58,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2916450.0, ans=0.2 2024-08-15 00:16:58,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2916450.0, ans=0.125 2024-08-15 00:17:10,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-15 00:17:14,487 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 00:17:15,856 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 00:17:21,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2916550.0, ans=0.125 2024-08-15 00:17:29,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.322e+01 2.601e+01 3.039e+01 2.068e+02, threshold=5.202e+01, percent-clipped=5.0 2024-08-15 00:17:30,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2916650.0, ans=0.125 2024-08-15 00:17:52,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1850, loss[loss=0.08438, beats_loss=0.01097, ecapa_loss=0.0001291, whisper_loss=0.07213, over 15091.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.0001467, whisper_loss=0.08965, over 3821561.59 frames. ], batch size: 58, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:17:54,691 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-15 00:17:55,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2916850.0, ans=0.125 2024-08-15 00:17:56,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2916850.0, ans=0.125 2024-08-15 00:17:56,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-08-15 00:18:11,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2916950.0, ans=0.04949747468305833 2024-08-15 00:18:12,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2916950.0, ans=0.125 2024-08-15 00:18:21,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2917050.0, ans=0.04949747468305833 2024-08-15 00:18:27,912 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.367e+00 2024-08-15 00:18:39,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2917150.0, ans=0.0 2024-08-15 00:18:45,817 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-08-15 00:19:00,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.52 vs. limit=22.5 2024-08-15 00:19:02,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2917250.0, ans=10.0 2024-08-15 00:19:04,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2917250.0, ans=0.125 2024-08-15 00:19:05,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2917250.0, ans=0.125 2024-08-15 00:19:05,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2917250.0, ans=0.125 2024-08-15 00:19:10,419 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1900, loss[loss=0.09234, beats_loss=0.01321, ecapa_loss=0.0001395, whisper_loss=0.07773, over 21448.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001471, whisper_loss=0.08989, over 3839268.03 frames. ], batch size: 84, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:19:43,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-15 00:19:54,067 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 8 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 00:20:06,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.388e+01 2.711e+01 3.006e+01 3.511e+02, threshold=5.422e+01, percent-clipped=5.0 2024-08-15 00:20:09,201 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 00:20:29,214 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:20:30,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 1950, loss[loss=0.09091, beats_loss=0.009074, ecapa_loss=0.000134, whisper_loss=0.08049, over 14243.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001473, whisper_loss=0.0897, over 3822947.42 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:20:42,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2917850.0, ans=0.125 2024-08-15 00:20:51,795 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 00:21:25,635 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 00:21:39,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2918250.0, ans=0.0 2024-08-15 00:21:42,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2918250.0, ans=0.125 2024-08-15 00:21:52,695 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2000, loss[loss=0.1111, beats_loss=0.009785, ecapa_loss=0.0001721, whisper_loss=0.0996, over 19790.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001469, whisper_loss=0.08942, over 3786308.34 frames. ], batch size: 76, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:21:54,032 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-15 00:22:26,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-08-15 00:22:34,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-15 00:22:36,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2918550.0, ans=0.0 2024-08-15 00:22:49,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.256e+01 2.494e+01 2.913e+01 5.528e+01, threshold=4.988e+01, percent-clipped=1.0 2024-08-15 00:22:50,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=2918650.0, ans=15.0 2024-08-15 00:22:53,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2918650.0, ans=0.125 2024-08-15 00:22:57,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2918650.0, ans=0.125 2024-08-15 00:23:12,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2918750.0, ans=0.2 2024-08-15 00:23:15,063 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2050, loss[loss=0.08625, beats_loss=0.01004, ecapa_loss=0.000154, whisper_loss=0.07467, over 22415.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01069, ecapa_loss=0.0001466, whisper_loss=0.08862, over 3789935.59 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:23:15,220 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 00:23:25,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2918850.0, ans=0.0 2024-08-15 00:23:37,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2918950.0, ans=0.125 2024-08-15 00:23:53,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2919050.0, ans=0.2 2024-08-15 00:23:57,814 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 00:24:06,649 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.676e-01 2024-08-15 00:24:08,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2919150.0, ans=0.1 2024-08-15 00:24:11,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2919150.0, ans=0.125 2024-08-15 00:24:30,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2919250.0, ans=0.125 2024-08-15 00:24:36,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2100, loss[loss=0.1009, beats_loss=0.01074, ecapa_loss=0.0001821, whisper_loss=0.0883, over 22139.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01071, ecapa_loss=0.0001455, whisper_loss=0.08871, over 3787436.78 frames. ], batch size: 95, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:24:43,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-15 00:24:52,265 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 00:25:03,912 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-15 00:25:18,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-08-15 00:25:22,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2919550.0, ans=0.125 2024-08-15 00:25:31,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.305e+01 2.496e+01 2.863e+01 3.632e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 00:25:46,424 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 00:25:57,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2919850.0, ans=0.125 2024-08-15 00:25:57,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2150, loss[loss=0.1138, beats_loss=0.009979, ecapa_loss=0.0001437, whisper_loss=0.1024, over 20564.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01068, ecapa_loss=0.0001469, whisper_loss=0.08866, over 3793155.88 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:26:27,588 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 00:26:34,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2920050.0, ans=0.2 2024-08-15 00:26:52,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2920150.0, ans=0.1 2024-08-15 00:27:08,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2920250.0, ans=0.125 2024-08-15 00:27:11,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2920250.0, ans=0.125 2024-08-15 00:27:14,562 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 00:27:24,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-15 00:27:26,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2200, loss[loss=0.08842, beats_loss=0.01212, ecapa_loss=0.0001603, whisper_loss=0.0747, over 22183.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0001482, whisper_loss=0.08898, over 3790649.02 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:27:44,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2920450.0, ans=0.0 2024-08-15 00:27:48,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2920450.0, ans=0.1 2024-08-15 00:28:19,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2920650.0, ans=0.125 2024-08-15 00:28:19,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2920650.0, ans=0.125 2024-08-15 00:28:22,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.682e+01 3.041e+01 4.507e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-15 00:28:38,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2920750.0, ans=0.95 2024-08-15 00:28:47,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2920750.0, ans=0.0 2024-08-15 00:28:49,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2250, loss[loss=0.09682, beats_loss=0.01169, ecapa_loss=0.0001194, whisper_loss=0.08393, over 17378.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.000149, whisper_loss=0.09023, over 3805966.86 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:28:50,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2920850.0, ans=0.05 2024-08-15 00:28:52,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2920850.0, ans=0.125 2024-08-15 00:29:02,585 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 00:29:19,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2024-08-15 00:29:59,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.75 vs. limit=15.0 2024-08-15 00:30:11,317 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2300, loss[loss=0.1216, beats_loss=0.01057, ecapa_loss=0.0001377, whisper_loss=0.1096, over 24488.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001504, whisper_loss=0.09062, over 3849802.10 frames. ], batch size: 95, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:30:15,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2921350.0, ans=0.0 2024-08-15 00:30:17,978 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 29 from LS+wenet, 10 from Vox, 18 fro AS 2024-08-15 00:30:19,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2921350.0, ans=0.0 2024-08-15 00:30:24,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2921350.0, ans=0.125 2024-08-15 00:30:32,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2921450.0, ans=0.0 2024-08-15 00:30:37,321 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-15 00:30:44,665 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 20 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 00:30:45,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2921550.0, ans=0.125 2024-08-15 00:30:56,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2921650.0, ans=0.0 2024-08-15 00:31:04,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.272e+01 2.490e+01 2.826e+01 4.749e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-15 00:31:06,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2921650.0, ans=0.125 2024-08-15 00:31:12,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-15 00:31:19,617 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 00:31:32,532 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2350, loss[loss=0.1277, beats_loss=0.006365, ecapa_loss=0.0001842, whisper_loss=0.1195, over 22445.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.000151, whisper_loss=0.0911, over 3819600.37 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:31:43,931 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 00:31:47,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2921950.0, ans=0.0 2024-08-15 00:31:52,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2921950.0, ans=0.1 2024-08-15 00:31:52,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2921950.0, ans=0.125 2024-08-15 00:31:55,836 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-15 00:31:57,187 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 00:31:57,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2921950.0, ans=0.05 2024-08-15 00:32:06,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2922050.0, ans=0.125 2024-08-15 00:32:21,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2922150.0, ans=0.2 2024-08-15 00:32:49,546 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 00:32:53,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2400, loss[loss=0.09774, beats_loss=0.01035, ecapa_loss=0.0001389, whisper_loss=0.086, over 21635.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001514, whisper_loss=0.09107, over 3842978.90 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:33:08,827 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 00:33:12,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2922450.0, ans=0.0 2024-08-15 00:33:17,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2922450.0, ans=0.07 2024-08-15 00:33:18,449 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 00:33:21,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2922450.0, ans=0.1 2024-08-15 00:33:28,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2922550.0, ans=0.125 2024-08-15 00:33:36,006 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 00:33:47,134 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-15 00:33:50,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.275e+01 2.495e+01 2.898e+01 2.121e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-15 00:33:54,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2922650.0, ans=0.1 2024-08-15 00:34:15,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2450, loss[loss=0.1245, beats_loss=0.01019, ecapa_loss=0.0001761, whisper_loss=0.1126, over 18500.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001507, whisper_loss=0.09037, over 3842563.55 frames. ], batch size: 74, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:34:24,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2922850.0, ans=0.1 2024-08-15 00:34:40,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2922950.0, ans=0.125 2024-08-15 00:34:47,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2922950.0, ans=0.1 2024-08-15 00:35:08,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2923150.0, ans=0.1 2024-08-15 00:35:20,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-15 00:35:29,874 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 00:35:37,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2923350.0, ans=0.125 2024-08-15 00:35:38,639 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2500, loss[loss=0.09036, beats_loss=0.01306, ecapa_loss=0.0001391, whisper_loss=0.07591, over 20300.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001499, whisper_loss=0.09014, over 3893230.85 frames. ], batch size: 84, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:35:38,836 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 00:35:47,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2923350.0, ans=0.1 2024-08-15 00:35:50,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2923350.0, ans=0.07 2024-08-15 00:36:14,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2923550.0, ans=0.125 2024-08-15 00:36:32,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.328e+01 2.584e+01 2.965e+01 7.495e+01, threshold=5.168e+01, percent-clipped=2.0 2024-08-15 00:36:42,178 WARNING [optim.py:496] (2/4) Scaling gradients by 0.026615602895617485, model_norm_threshold=51.67815017700195 2024-08-15 00:36:42,353 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.809e+05, grad_sumsq=6.809e+05, orig_rms_sq=1.000e+00 2024-08-15 00:36:47,649 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 33 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 00:36:50,183 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 00:36:57,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2923750.0, ans=0.125 2024-08-15 00:36:59,185 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2550, loss[loss=0.127, beats_loss=0.008796, ecapa_loss=0.0001753, whisper_loss=0.1165, over 22658.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.000151, whisper_loss=0.09026, over 3910871.75 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:37:00,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2024-08-15 00:37:07,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2923850.0, ans=0.2 2024-08-15 00:37:29,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2924050.0, ans=0.1 2024-08-15 00:37:35,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2924050.0, ans=0.125 2024-08-15 00:37:55,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2924150.0, ans=0.1 2024-08-15 00:38:13,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-08-15 00:38:14,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=12.0 2024-08-15 00:38:16,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2600, loss[loss=0.08481, beats_loss=0.01215, ecapa_loss=0.0001283, whisper_loss=0.07137, over 21985.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01071, ecapa_loss=0.0001497, whisper_loss=0.08987, over 3903594.88 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:38:19,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2924350.0, ans=0.125 2024-08-15 00:38:32,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2924450.0, ans=0.0 2024-08-15 00:38:52,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-15 00:38:58,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2924550.0, ans=0.1 2024-08-15 00:39:08,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.339e+01 2.635e+01 2.939e+01 1.942e+03, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 00:39:11,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2924650.0, ans=0.04949747468305833 2024-08-15 00:39:12,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2924650.0, ans=0.125 2024-08-15 00:39:26,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2024-08-15 00:39:32,851 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2650, loss[loss=0.1068, beats_loss=0.0112, ecapa_loss=0.0001515, whisper_loss=0.0941, over 18521.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001513, whisper_loss=0.09042, over 3901538.77 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:39:55,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2924950.0, ans=0.0 2024-08-15 00:40:01,841 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 00:40:15,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2925050.0, ans=0.04949747468305833 2024-08-15 00:40:17,851 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-15 00:40:21,380 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.282e-01 2024-08-15 00:40:30,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2925150.0, ans=0.125 2024-08-15 00:40:34,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-08-15 00:40:50,039 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2700, loss[loss=0.1099, beats_loss=0.01096, ecapa_loss=0.0001267, whisper_loss=0.09766, over 14253.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.0899, over 3860205.57 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:41:25,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2925550.0, ans=0.2 2024-08-15 00:41:32,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2925550.0, ans=0.125 2024-08-15 00:41:37,204 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 00:41:40,708 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 00:41:43,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.279e+01 2.490e+01 2.726e+01 4.419e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-15 00:42:04,641 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 00:42:07,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-15 00:42:09,553 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2750, loss[loss=0.09236, beats_loss=0.01269, ecapa_loss=0.000113, whisper_loss=0.07854, over 18346.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001491, whisper_loss=0.09012, over 3829976.99 frames. ], batch size: 72, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:42:40,122 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 00:42:49,545 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-15 00:42:58,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2926150.0, ans=0.2 2024-08-15 00:42:59,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2926150.0, ans=0.2 2024-08-15 00:43:11,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-15 00:43:32,306 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2800, loss[loss=0.1221, beats_loss=0.007257, ecapa_loss=0.0001482, whisper_loss=0.1134, over 18154.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001492, whisper_loss=0.09046, over 3831304.65 frames. ], batch size: 68, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:43:44,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2926350.0, ans=0.125 2024-08-15 00:43:48,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2926350.0, ans=0.0 2024-08-15 00:43:54,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2926450.0, ans=0.1 2024-08-15 00:44:06,419 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 00:44:25,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-15 00:44:28,208 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 00:44:31,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.375e+01 2.624e+01 2.895e+01 7.200e+01, threshold=5.247e+01, percent-clipped=1.0 2024-08-15 00:44:57,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2926750.0, ans=0.125 2024-08-15 00:45:00,080 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2850, loss[loss=0.1091, beats_loss=0.009963, ecapa_loss=0.0001708, whisper_loss=0.09745, over 18049.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001483, whisper_loss=0.0908, over 3813081.69 frames. ], batch size: 71, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:45:01,650 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 00:45:02,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2926850.0, ans=0.125 2024-08-15 00:45:37,383 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 00:46:06,765 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 00:46:24,462 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2900, loss[loss=0.09095, beats_loss=0.009931, ecapa_loss=0.0001775, whisper_loss=0.07925, over 13419.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001494, whisper_loss=0.09085, over 3837046.68 frames. ], batch size: 55, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:46:30,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2927350.0, ans=0.125 2024-08-15 00:46:52,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2927450.0, ans=0.1 2024-08-15 00:46:52,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2024-08-15 00:46:54,803 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 00:47:16,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-08-15 00:47:24,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.340e+01 2.621e+01 2.930e+01 9.659e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-15 00:47:43,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2927750.0, ans=0.2 2024-08-15 00:47:44,560 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 41 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 00:47:52,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-15 00:47:52,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 2950, loss[loss=0.09823, beats_loss=0.01152, ecapa_loss=0.0001594, whisper_loss=0.08512, over 18308.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001512, whisper_loss=0.09097, over 3852899.60 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:47:57,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2927850.0, ans=0.1 2024-08-15 00:47:58,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2927850.0, ans=0.2 2024-08-15 00:47:58,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=22.5 2024-08-15 00:48:23,411 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 00:48:41,507 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 00:48:42,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2928050.0, ans=0.09899494936611666 2024-08-15 00:48:49,183 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 00:48:59,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2024-08-15 00:49:14,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2928250.0, ans=0.07 2024-08-15 00:49:22,673 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3000, loss[loss=0.07899, beats_loss=0.01411, ecapa_loss=8.535e-05, whisper_loss=0.06402, over 15674.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.09127, over 3887384.48 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:49:22,673 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 00:50:02,519 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005339, whisper_loss=0.248, over 922467.00 frames. 2024-08-15 00:50:21,799 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-15 00:52:15,974 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 00:52:15,977 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 00:52:27,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2928350.0, ans=0.035 2024-08-15 00:52:31,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2928450.0, ans=0.09899494936611666 2024-08-15 00:52:36,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2928450.0, ans=0.125 2024-08-15 00:52:40,499 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-15 00:52:47,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2928550.0, ans=0.0 2024-08-15 00:53:06,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2928650.0, ans=0.125 2024-08-15 00:53:07,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2928650.0, ans=0.0 2024-08-15 00:53:08,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.494e+01 2.724e+01 3.053e+01 2.712e+02, threshold=5.448e+01, percent-clipped=2.0 2024-08-15 00:53:31,681 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 00:53:37,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3050, loss[loss=0.1163, beats_loss=0.01024, ecapa_loss=0.0001475, whisper_loss=0.1046, over 19979.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.000151, whisper_loss=0.09204, over 3891817.20 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:53:53,406 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 00:54:06,546 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 00:54:08,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2928950.0, ans=0.125 2024-08-15 00:54:16,678 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 00:54:24,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2929050.0, ans=0.0 2024-08-15 00:54:28,076 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=12.0 2024-08-15 00:55:03,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2929250.0, ans=0.1 2024-08-15 00:55:06,014 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3100, loss[loss=0.07581, beats_loss=0.01305, ecapa_loss=0.000133, whisper_loss=0.06143, over 21973.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01056, ecapa_loss=0.0001512, whisper_loss=0.09211, over 3873225.28 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:55:10,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=12.0 2024-08-15 00:55:28,653 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 00:55:29,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2024-08-15 00:55:37,015 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 00:55:56,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2024-08-15 00:56:02,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2929650.0, ans=0.1 2024-08-15 00:56:05,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.234e+01 2.421e+01 2.829e+01 3.932e+01, threshold=4.842e+01, percent-clipped=0.0 2024-08-15 00:56:26,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2929750.0, ans=10.0 2024-08-15 00:56:34,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3150, loss[loss=0.09068, beats_loss=0.01142, ecapa_loss=0.0001462, whisper_loss=0.0778, over 18106.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01061, ecapa_loss=0.000151, whisper_loss=0.0922, over 3852669.34 frames. ], batch size: 74, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:56:37,719 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 00:56:59,426 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 00:57:01,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2929950.0, ans=0.125 2024-08-15 00:57:06,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2930050.0, ans=0.0 2024-08-15 00:57:19,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-08-15 00:57:21,423 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 00:57:21,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2930050.0, ans=0.125 2024-08-15 00:57:51,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-08-15 00:57:58,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3200, loss[loss=0.09424, beats_loss=0.01262, ecapa_loss=0.0001483, whisper_loss=0.08013, over 20071.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001506, whisper_loss=0.09136, over 3846058.71 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:58:03,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2930350.0, ans=0.0 2024-08-15 00:58:06,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2930350.0, ans=0.125 2024-08-15 00:58:06,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2930350.0, ans=0.0 2024-08-15 00:58:21,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2930450.0, ans=0.2 2024-08-15 00:58:39,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2930550.0, ans=0.0 2024-08-15 00:58:51,587 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 00:58:57,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2930650.0, ans=0.125 2024-08-15 00:58:58,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.340e+01 2.607e+01 2.888e+01 4.627e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 00:58:59,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2930650.0, ans=0.04949747468305833 2024-08-15 00:59:09,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2930750.0, ans=0.125 2024-08-15 00:59:10,510 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07108230143785477, model_norm_threshold=52.141944885253906 2024-08-15 00:59:10,687 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.209e+04, grad_sumsq=9.209e+04, orig_rms_sq=1.000e+00 2024-08-15 00:59:26,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3250, loss[loss=0.1034, beats_loss=0.01105, ecapa_loss=0.0001613, whisper_loss=0.09077, over 19271.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001525, whisper_loss=0.09171, over 3856876.05 frames. ], batch size: 77, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:59:36,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2930850.0, ans=0.0 2024-08-15 00:59:39,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2930850.0, ans=0.1 2024-08-15 00:59:52,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2930950.0, ans=0.125 2024-08-15 00:59:53,415 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-15 00:59:55,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2930950.0, ans=0.125 2024-08-15 00:59:58,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2930950.0, ans=0.0 2024-08-15 00:59:59,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-15 01:00:05,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2931050.0, ans=0.0 2024-08-15 01:00:26,395 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 01:00:32,103 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 01:00:53,144 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3300, loss[loss=0.1177, beats_loss=0.008866, ecapa_loss=0.0001452, whisper_loss=0.1074, over 22901.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001516, whisper_loss=0.09166, over 3887487.77 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:00:57,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2931350.0, ans=0.125 2024-08-15 01:01:19,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2931450.0, ans=0.2 2024-08-15 01:01:47,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.317e+01 2.622e+01 2.887e+01 7.335e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-15 01:02:14,782 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3350, loss[loss=0.1054, beats_loss=0.0119, ecapa_loss=0.0001366, whisper_loss=0.09214, over 22146.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001513, whisper_loss=0.0913, over 3924906.04 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:02:23,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2931850.0, ans=0.2 2024-08-15 01:02:38,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2931950.0, ans=0.1 2024-08-15 01:02:55,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2932050.0, ans=0.0 2024-08-15 01:02:56,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-15 01:03:15,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2932150.0, ans=0.125 2024-08-15 01:03:22,590 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 01:03:45,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3400, loss[loss=0.08889, beats_loss=0.01409, ecapa_loss=0.0001136, whisper_loss=0.07366, over 17898.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001506, whisper_loss=0.09146, over 3924019.25 frames. ], batch size: 69, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:03:48,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2932350.0, ans=0.125 2024-08-15 01:03:53,175 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-15 01:03:58,715 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 01:04:03,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-15 01:04:10,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2932450.0, ans=0.125 2024-08-15 01:04:15,227 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 01:04:18,331 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 01:04:23,039 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 01:04:40,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2932650.0, ans=0.125 2024-08-15 01:04:44,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.349e+01 2.635e+01 2.975e+01 2.960e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 01:05:03,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2932750.0, ans=0.04949747468305833 2024-08-15 01:05:14,100 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3450, loss[loss=0.04557, beats_loss=0.01434, ecapa_loss=0.0001408, whisper_loss=0.02982, over 12773.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001515, whisper_loss=0.09084, over 3890123.58 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:05:21,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2932850.0, ans=0.125 2024-08-15 01:05:25,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2932850.0, ans=0.125 2024-08-15 01:05:43,279 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 01:05:45,390 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 01:05:52,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2933050.0, ans=0.2 2024-08-15 01:05:58,722 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 01:06:11,881 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 01:06:32,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-08-15 01:06:47,202 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3500, loss[loss=0.07478, beats_loss=0.01352, ecapa_loss=0.0001784, whisper_loss=0.05948, over 17373.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001525, whisper_loss=0.09062, over 3864576.88 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:06:52,443 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0579170286655426, model_norm_threshold=52.703590393066406 2024-08-15 01:06:52,606 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.021e+05, grad_sumsq=2.962e+04, orig_rms_sq=3.448e+00 2024-08-15 01:07:03,934 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 01:07:04,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=12.0 2024-08-15 01:07:29,139 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 23 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-15 01:07:35,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2933550.0, ans=0.125 2024-08-15 01:07:44,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.354e+01 2.571e+01 2.919e+01 9.100e+02, threshold=5.142e+01, percent-clipped=1.0 2024-08-15 01:07:45,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2933650.0, ans=0.1 2024-08-15 01:07:58,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2933750.0, ans=0.0 2024-08-15 01:08:10,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3550, loss[loss=0.09662, beats_loss=0.01089, ecapa_loss=0.0001481, whisper_loss=0.08425, over 21935.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.000152, whisper_loss=0.09022, over 3858635.77 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:08:52,585 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 01:09:12,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2934150.0, ans=0.125 2024-08-15 01:09:22,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=8.0 2024-08-15 01:09:33,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3600, loss[loss=0.1162, beats_loss=0.008739, ecapa_loss=0.0001446, whisper_loss=0.1061, over 19975.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001518, whisper_loss=0.09019, over 3834964.96 frames. ], batch size: 74, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:09:48,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2934350.0, ans=0.125 2024-08-15 01:09:51,182 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 01:09:54,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2934450.0, ans=0.1 2024-08-15 01:09:57,949 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 01:10:02,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2934450.0, ans=0.025 2024-08-15 01:10:03,066 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 01:10:17,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2934550.0, ans=0.125 2024-08-15 01:10:17,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2934550.0, ans=0.0 2024-08-15 01:10:23,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2934650.0, ans=0.0 2024-08-15 01:10:31,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.251e+01 2.514e+01 2.875e+01 6.843e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-15 01:10:32,924 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 01:10:34,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2934650.0, ans=0.1 2024-08-15 01:10:41,200 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 01:10:52,232 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 01:10:56,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2934750.0, ans=0.125 2024-08-15 01:10:58,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3650, loss[loss=0.122, beats_loss=0.008834, ecapa_loss=0.0001567, whisper_loss=0.1116, over 19643.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001522, whisper_loss=0.09013, over 3814389.33 frames. ], batch size: 77, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:11:20,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-08-15 01:11:23,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2934950.0, ans=0.0 2024-08-15 01:11:33,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-15 01:11:52,272 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 01:12:01,883 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 14 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 01:12:11,588 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 01:12:19,902 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3700, loss[loss=0.09456, beats_loss=0.008629, ecapa_loss=0.00019, whisper_loss=0.08403, over 17931.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001531, whisper_loss=0.09001, over 3861677.31 frames. ], batch size: 75, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:12:27,574 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-15 01:12:28,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2935350.0, ans=0.125 2024-08-15 01:12:35,445 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 01:12:38,007 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 01:12:47,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2935450.0, ans=0.125 2024-08-15 01:13:03,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2935550.0, ans=0.125 2024-08-15 01:13:10,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2935650.0, ans=0.0 2024-08-15 01:13:17,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.268e+01 2.455e+01 2.851e+01 9.066e+01, threshold=4.910e+01, percent-clipped=1.0 2024-08-15 01:13:40,760 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 10 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 01:13:45,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3750, loss[loss=0.09047, beats_loss=0.01106, ecapa_loss=0.0001547, whisper_loss=0.07787, over 23144.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001533, whisper_loss=0.09008, over 3869832.33 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:14:09,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2935950.0, ans=0.1 2024-08-15 01:14:15,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2935950.0, ans=0.125 2024-08-15 01:14:21,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2936050.0, ans=0.0 2024-08-15 01:14:31,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2936050.0, ans=0.0 2024-08-15 01:14:39,435 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 01:14:57,675 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 01:15:04,380 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 01:15:11,058 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3800, loss[loss=0.1058, beats_loss=0.01039, ecapa_loss=0.0001703, whisper_loss=0.09366, over 22362.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001536, whisper_loss=0.09043, over 3860271.47 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:15:42,947 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 01:15:55,814 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-15 01:16:01,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2936650.0, ans=0.0 2024-08-15 01:16:06,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.293e+01 2.527e+01 3.101e+01 3.900e+02, threshold=5.055e+01, percent-clipped=2.0 2024-08-15 01:16:29,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2936750.0, ans=0.0 2024-08-15 01:16:29,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2936750.0, ans=0.125 2024-08-15 01:16:34,151 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3850, loss[loss=0.1218, beats_loss=0.01013, ecapa_loss=0.0001453, whisper_loss=0.1102, over 21076.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001525, whisper_loss=0.09101, over 3877106.18 frames. ], batch size: 84, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:16:56,849 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 01:17:09,408 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 01:17:37,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2937150.0, ans=0.125 2024-08-15 01:17:37,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2937150.0, ans=0.125 2024-08-15 01:17:46,248 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-15 01:17:57,852 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 01:18:01,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3900, loss[loss=0.07307, beats_loss=0.01189, ecapa_loss=0.0001874, whisper_loss=0.0593, over 14042.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001536, whisper_loss=0.09137, over 3888693.97 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:18:06,359 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 01:18:06,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2937350.0, ans=0.0 2024-08-15 01:18:13,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2937350.0, ans=0.1 2024-08-15 01:18:13,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2937350.0, ans=0.1 2024-08-15 01:18:20,933 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 01:18:39,954 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 01:18:43,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2937550.0, ans=0.0 2024-08-15 01:18:49,834 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 32 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-15 01:18:59,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.342e+01 2.596e+01 2.902e+01 5.795e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-15 01:19:17,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2937750.0, ans=0.2 2024-08-15 01:19:27,351 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 3950, loss[loss=0.0896, beats_loss=0.01256, ecapa_loss=0.0001321, whisper_loss=0.07571, over 23090.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.000154, whisper_loss=0.09181, over 3893573.98 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:19:33,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2937850.0, ans=0.125 2024-08-15 01:19:52,531 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 01:20:03,252 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 32 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 01:20:13,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2938050.0, ans=0.0 2024-08-15 01:20:19,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2938050.0, ans=0.0 2024-08-15 01:20:36,768 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-15 01:20:38,691 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 01:20:43,488 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 01:20:45,175 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 01:20:54,759 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 01:20:56,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4000, loss[loss=0.08772, beats_loss=0.01081, ecapa_loss=0.0001623, whisper_loss=0.07528, over 21519.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01056, ecapa_loss=0.0001535, whisper_loss=0.09163, over 3892918.11 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:21:02,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2938350.0, ans=0.125 2024-08-15 01:21:07,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-15 01:21:47,010 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 16 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 01:21:51,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2938650.0, ans=0.0 2024-08-15 01:21:58,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.427e+01 2.607e+01 2.957e+01 4.809e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 01:21:58,662 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 01:22:19,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2938750.0, ans=0.0 2024-08-15 01:22:19,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2938750.0, ans=0.125 2024-08-15 01:22:20,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2938750.0, ans=0.2 2024-08-15 01:22:23,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2938750.0, ans=0.1 2024-08-15 01:22:27,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4050, loss[loss=0.08887, beats_loss=0.01012, ecapa_loss=0.0001715, whisper_loss=0.07704, over 18889.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001528, whisper_loss=0.09141, over 3904728.61 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:22:37,117 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 33 from Vox, 40 fro AS 2024-08-15 01:22:39,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2938850.0, ans=0.2 2024-08-15 01:22:42,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-08-15 01:22:42,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-15 01:23:07,944 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.798e-01 2024-08-15 01:23:22,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-15 01:23:49,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2939250.0, ans=0.1 2024-08-15 01:23:49,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-15 01:23:51,730 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 01:23:58,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4100, loss[loss=0.1133, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.1015, over 20447.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.0001536, whisper_loss=0.09185, over 3901484.11 frames. ], batch size: 77, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:23:59,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2939350.0, ans=0.125 2024-08-15 01:24:14,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.13 vs. limit=5.0 2024-08-15 01:24:34,029 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 01:24:44,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-08-15 01:24:58,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.312e+01 2.568e+01 2.993e+01 3.291e+02, threshold=5.136e+01, percent-clipped=2.0 2024-08-15 01:24:59,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2939650.0, ans=0.0 2024-08-15 01:25:26,417 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4150, loss[loss=0.114, beats_loss=0.009196, ecapa_loss=0.0001404, whisper_loss=0.1034, over 22856.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01063, ecapa_loss=0.0001548, whisper_loss=0.09252, over 3907153.22 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:25:43,365 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 01:25:47,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2939950.0, ans=0.2 2024-08-15 01:26:18,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2940150.0, ans=0.0 2024-08-15 01:26:25,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2940150.0, ans=0.125 2024-08-15 01:26:52,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4200, loss[loss=0.08888, beats_loss=0.01121, ecapa_loss=0.0001438, whisper_loss=0.07623, over 19544.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01065, ecapa_loss=0.0001549, whisper_loss=0.09198, over 3905948.59 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:26:54,572 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 01:26:59,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2940350.0, ans=0.125 2024-08-15 01:27:09,397 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 01:27:16,973 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.024e+05 2024-08-15 01:27:26,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=12.0 2024-08-15 01:27:33,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2940550.0, ans=0.125 2024-08-15 01:27:44,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.316e+01 2.522e+01 2.873e+01 3.693e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 01:27:57,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2940750.0, ans=0.125 2024-08-15 01:28:05,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2024-08-15 01:28:06,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4250, loss[loss=0.08545, beats_loss=0.01253, ecapa_loss=0.0001389, whisper_loss=0.07153, over 17052.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0001538, whisper_loss=0.092, over 3906899.74 frames. ], batch size: 71, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:28:18,517 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 01:28:32,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2940950.0, ans=0.1 2024-08-15 01:28:36,223 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 01:28:39,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2941050.0, ans=0.125 2024-08-15 01:29:05,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2941250.0, ans=0.125 2024-08-15 01:29:07,786 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 01:29:11,802 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-15 01:29:13,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4300, loss[loss=0.1158, beats_loss=0.007968, ecapa_loss=0.0001717, whisper_loss=0.1061, over 22277.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01058, ecapa_loss=0.0001529, whisper_loss=0.09202, over 3927136.12 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:29:30,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-15 01:29:34,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-15 01:29:58,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.262e+01 2.467e+01 2.860e+01 4.963e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-15 01:30:15,711 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 01:30:19,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4350, loss[loss=0.1193, beats_loss=0.008217, ecapa_loss=0.0001763, whisper_loss=0.1093, over 22587.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01049, ecapa_loss=0.0001535, whisper_loss=0.09177, over 3904073.28 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:30:35,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2941950.0, ans=0.125 2024-08-15 01:30:43,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2941950.0, ans=0.125 2024-08-15 01:31:10,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2942150.0, ans=0.2 2024-08-15 01:31:25,787 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4400, loss[loss=0.1042, beats_loss=0.01208, ecapa_loss=0.0001447, whisper_loss=0.09065, over 22762.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01053, ecapa_loss=0.0001537, whisper_loss=0.09173, over 3922487.43 frames. ], batch size: 94, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:31:26,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2942350.0, ans=0.125 2024-08-15 01:31:41,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2942450.0, ans=0.125 2024-08-15 01:31:51,702 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 01:31:53,130 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 01:32:01,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2942550.0, ans=0.05 2024-08-15 01:32:01,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-08-15 01:32:04,936 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 9 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-15 01:32:09,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.385e+01 2.562e+01 2.975e+01 4.289e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 01:32:12,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2942650.0, ans=0.125 2024-08-15 01:32:16,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2942750.0, ans=0.125 2024-08-15 01:32:20,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2942750.0, ans=0.0 2024-08-15 01:32:24,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2942750.0, ans=0.1 2024-08-15 01:32:26,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-15 01:32:28,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2942750.0, ans=0.125 2024-08-15 01:32:30,911 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4450, loss[loss=0.0981, beats_loss=0.01182, ecapa_loss=0.0001361, whisper_loss=0.08492, over 20308.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001534, whisper_loss=0.09121, over 3912291.65 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:32:34,829 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 01:32:41,508 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:32:57,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-08-15 01:32:58,626 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.638e+01 2024-08-15 01:33:03,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2943050.0, ans=0.0 2024-08-15 01:33:16,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-15 01:33:27,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-15 01:33:35,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-15 01:33:36,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4500, loss[loss=0.08942, beats_loss=0.01337, ecapa_loss=0.0001478, whisper_loss=0.07457, over 16086.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001525, whisper_loss=0.09098, over 3874119.24 frames. ], batch size: 64, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:33:41,859 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 01:33:54,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2943450.0, ans=0.1 2024-08-15 01:33:56,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2943450.0, ans=0.125 2024-08-15 01:33:58,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-15 01:34:05,499 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-15 01:34:23,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.368e+01 2.661e+01 3.187e+01 2.204e+02, threshold=5.323e+01, percent-clipped=1.0 2024-08-15 01:34:31,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2943750.0, ans=0.2 2024-08-15 01:34:32,012 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 19 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 01:34:34,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2943750.0, ans=0.0 2024-08-15 01:34:37,495 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 01:34:40,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2943750.0, ans=0.125 2024-08-15 01:34:42,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4550, loss[loss=0.09638, beats_loss=0.00953, ecapa_loss=0.0001609, whisper_loss=0.08524, over 22292.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001542, whisper_loss=0.09095, over 3902682.42 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:34:44,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2943850.0, ans=0.125 2024-08-15 01:34:45,478 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 33 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 01:34:57,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2943950.0, ans=0.1 2024-08-15 01:35:07,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2943950.0, ans=0.125 2024-08-15 01:35:14,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2944050.0, ans=0.125 2024-08-15 01:35:19,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2944050.0, ans=0.0 2024-08-15 01:35:23,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2944150.0, ans=0.2 2024-08-15 01:35:42,188 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 01:35:44,960 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 01:35:48,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4600, loss[loss=0.07333, beats_loss=0.01172, ecapa_loss=0.0001575, whisper_loss=0.06004, over 13024.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001529, whisper_loss=0.09089, over 3901210.98 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:35:57,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2944350.0, ans=0.125 2024-08-15 01:36:05,858 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 01:36:07,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2944450.0, ans=0.2 2024-08-15 01:36:13,429 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 01:36:13,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2944550.0, ans=0.035 2024-08-15 01:36:25,090 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 01:36:28,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2944650.0, ans=0.125 2024-08-15 01:36:33,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.310e+01 2.603e+01 2.915e+01 4.398e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 01:36:35,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2944650.0, ans=0.125 2024-08-15 01:36:40,412 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 01:36:49,051 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 01:36:53,788 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4650, loss[loss=0.07883, beats_loss=0.01448, ecapa_loss=0.0001715, whisper_loss=0.06264, over 16386.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001538, whisper_loss=0.09023, over 3856834.26 frames. ], batch size: 72, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:37:03,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2944850.0, ans=0.0 2024-08-15 01:37:04,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2944850.0, ans=0.1 2024-08-15 01:37:11,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2944950.0, ans=0.035 2024-08-15 01:37:12,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2944950.0, ans=0.125 2024-08-15 01:37:16,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2944950.0, ans=0.1 2024-08-15 01:37:19,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2945050.0, ans=0.125 2024-08-15 01:37:22,848 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-15 01:37:24,199 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 01:37:31,114 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 01:37:48,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2945250.0, ans=0.125 2024-08-15 01:37:59,187 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4700, loss[loss=0.09872, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.08658, over 23227.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001535, whisper_loss=0.09029, over 3866701.95 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:38:18,314 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-15 01:38:21,367 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 01:38:23,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2945450.0, ans=0.5 2024-08-15 01:38:41,134 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 33 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 01:38:44,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.361e+01 2.586e+01 2.935e+01 3.925e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-15 01:38:48,150 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 01:38:55,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2945750.0, ans=0.125 2024-08-15 01:38:59,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2945750.0, ans=0.0 2024-08-15 01:39:05,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4750, loss[loss=0.1094, beats_loss=0.01011, ecapa_loss=0.0001488, whisper_loss=0.09783, over 22610.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001535, whisper_loss=0.09074, over 3888941.88 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:39:19,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2024-08-15 01:39:20,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2945950.0, ans=0.125 2024-08-15 01:39:22,937 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 01:39:40,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2946050.0, ans=0.0 2024-08-15 01:39:43,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2946050.0, ans=10.0 2024-08-15 01:39:44,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2946150.0, ans=0.2 2024-08-15 01:39:51,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2946150.0, ans=0.1 2024-08-15 01:39:53,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2946150.0, ans=0.1 2024-08-15 01:40:03,361 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 01:40:04,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2946250.0, ans=0.2 2024-08-15 01:40:14,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4800, loss[loss=0.09749, beats_loss=0.01129, ecapa_loss=0.0001931, whisper_loss=0.08427, over 18395.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001548, whisper_loss=0.08998, over 3875828.60 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:40:28,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2946450.0, ans=0.0 2024-08-15 01:40:53,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2946550.0, ans=0.0 2024-08-15 01:41:07,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.219e+01 2.446e+01 2.733e+01 3.979e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-15 01:41:08,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-15 01:41:11,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-15 01:41:13,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2946750.0, ans=0.1 2024-08-15 01:41:18,009 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 01:41:19,670 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 01:41:23,088 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 01:41:30,765 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4850, loss[loss=0.08915, beats_loss=0.01175, ecapa_loss=0.0001555, whisper_loss=0.07585, over 21065.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01076, ecapa_loss=0.0001532, whisper_loss=0.08954, over 3878182.46 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:41:45,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2946950.0, ans=0.125 2024-08-15 01:41:54,553 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 01:41:57,180 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 01:41:57,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-15 01:41:59,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-08-15 01:42:08,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2947050.0, ans=0.125 2024-08-15 01:42:13,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-15 01:42:17,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2947150.0, ans=0.0 2024-08-15 01:42:19,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2947150.0, ans=0.125 2024-08-15 01:42:35,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2024-08-15 01:42:49,386 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4900, loss[loss=0.09411, beats_loss=0.01249, ecapa_loss=0.0001423, whisper_loss=0.0802, over 16103.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01077, ecapa_loss=0.0001519, whisper_loss=0.09005, over 3886153.37 frames. ], batch size: 65, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:43:13,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2947450.0, ans=0.125 2024-08-15 01:43:27,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=15.0 2024-08-15 01:43:42,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.371e+01 2.641e+01 2.917e+01 5.290e+01, threshold=5.283e+01, percent-clipped=1.0 2024-08-15 01:43:51,815 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 01:43:57,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2947750.0, ans=0.015 2024-08-15 01:43:59,367 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 01:44:04,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2947850.0, ans=0.0 2024-08-15 01:44:04,846 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 4950, loss[loss=0.1144, beats_loss=0.01023, ecapa_loss=0.0001803, whisper_loss=0.1023, over 23084.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001525, whisper_loss=0.09011, over 3880801.96 frames. ], batch size: 95, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:44:12,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2947850.0, ans=0.2 2024-08-15 01:44:17,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2947850.0, ans=0.2 2024-08-15 01:44:28,109 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 01:44:29,621 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 01:44:37,502 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 01:44:49,097 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-15 01:45:04,297 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 01:45:06,606 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 01:45:13,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5000, loss[loss=0.1049, beats_loss=0.01039, ecapa_loss=0.0001734, whisper_loss=0.09276, over 21549.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001526, whisper_loss=0.09153, over 3892937.65 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:45:14,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2948350.0, ans=0.125 2024-08-15 01:45:15,734 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 01:45:19,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2948350.0, ans=0.125 2024-08-15 01:45:23,810 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 20 from LS+wenet, 35 from Vox, 40 fro AS 2024-08-15 01:45:34,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2948450.0, ans=0.2 2024-08-15 01:45:58,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.280e+01 2.510e+01 2.795e+01 4.349e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-15 01:45:59,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2948650.0, ans=0.1 2024-08-15 01:46:17,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2948850.0, ans=0.125 2024-08-15 01:46:17,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5050, loss[loss=0.08824, beats_loss=0.01164, ecapa_loss=0.0001816, whisper_loss=0.07479, over 21001.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001521, whisper_loss=0.09154, over 3888401.63 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:46:26,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2948850.0, ans=0.2 2024-08-15 01:46:28,983 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 01:46:39,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2948950.0, ans=0.1 2024-08-15 01:46:46,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2949050.0, ans=0.0 2024-08-15 01:46:47,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2949050.0, ans=0.125 2024-08-15 01:46:51,089 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 01:47:23,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5100, loss[loss=0.1152, beats_loss=0.00986, ecapa_loss=0.000158, whisper_loss=0.1038, over 20233.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001515, whisper_loss=0.09115, over 3891644.06 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:47:37,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-15 01:47:55,621 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 01:48:03,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2949650.0, ans=0.0 2024-08-15 01:48:08,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.324e+01 2.645e+01 2.910e+01 4.236e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-15 01:48:09,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2949650.0, ans=0.125 2024-08-15 01:48:28,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5150, loss[loss=0.1187, beats_loss=0.01078, ecapa_loss=0.0001525, whisper_loss=0.1064, over 23137.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001517, whisper_loss=0.0909, over 3918078.24 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:48:33,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2949850.0, ans=0.5 2024-08-15 01:48:36,064 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.342e-01 2024-08-15 01:48:58,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2950050.0, ans=0.2 2024-08-15 01:49:00,498 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 01:49:07,587 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-15 01:49:20,032 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 01:49:21,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2950250.0, ans=0.0 2024-08-15 01:49:32,878 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-08-15 01:49:33,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5200, loss[loss=0.1062, beats_loss=0.01093, ecapa_loss=0.0001573, whisper_loss=0.09372, over 20614.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.000151, whisper_loss=0.09109, over 3894336.27 frames. ], batch size: 82, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:49:51,978 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 01:49:53,448 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 01:50:14,925 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 01:50:19,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2950650.0, ans=0.0 2024-08-15 01:50:19,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.309e+01 2.539e+01 2.841e+01 2.667e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-15 01:50:23,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2950650.0, ans=0.125 2024-08-15 01:50:40,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5250, loss[loss=0.09408, beats_loss=0.01048, ecapa_loss=0.000172, whisper_loss=0.08189, over 20068.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001528, whisper_loss=0.09098, over 3853657.45 frames. ], batch size: 84, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:50:41,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2950850.0, ans=0.125 2024-08-15 01:50:43,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2950850.0, ans=0.125 2024-08-15 01:50:45,381 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 01:50:51,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2950850.0, ans=0.125 2024-08-15 01:51:16,409 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 01:51:30,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2951150.0, ans=0.04949747468305833 2024-08-15 01:51:48,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2951250.0, ans=0.0 2024-08-15 01:51:51,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5300, loss[loss=0.1041, beats_loss=0.01126, ecapa_loss=0.0001351, whisper_loss=0.09148, over 17329.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001521, whisper_loss=0.09179, over 3880218.27 frames. ], batch size: 66, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:52:10,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2951450.0, ans=0.2 2024-08-15 01:52:20,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2951550.0, ans=0.125 2024-08-15 01:52:34,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2951550.0, ans=0.125 2024-08-15 01:52:37,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2951650.0, ans=0.1 2024-08-15 01:52:41,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2951650.0, ans=0.125 2024-08-15 01:52:43,975 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.269e+01 2.498e+01 2.805e+01 4.853e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-15 01:53:00,331 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-15 01:53:07,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5350, loss[loss=0.1202, beats_loss=0.009207, ecapa_loss=0.0001399, whisper_loss=0.1096, over 24143.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.000151, whisper_loss=0.09095, over 3861562.53 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:53:13,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2951850.0, ans=0.0 2024-08-15 01:53:17,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2951850.0, ans=0.125 2024-08-15 01:53:21,726 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 01:53:24,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2951950.0, ans=0.125 2024-08-15 01:53:31,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2951950.0, ans=0.1 2024-08-15 01:53:39,115 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 01:53:41,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2952050.0, ans=0.2 2024-08-15 01:53:44,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2952050.0, ans=0.125 2024-08-15 01:53:44,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2952050.0, ans=0.1 2024-08-15 01:53:45,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2952050.0, ans=0.1 2024-08-15 01:53:48,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-15 01:53:49,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2952050.0, ans=0.125 2024-08-15 01:53:55,773 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 01:54:02,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-15 01:54:13,514 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 01:54:26,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5400, loss[loss=0.1011, beats_loss=0.0111, ecapa_loss=0.0001274, whisper_loss=0.08877, over 15126.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001513, whisper_loss=0.091, over 3841897.13 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:54:52,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2952450.0, ans=0.125 2024-08-15 01:54:53,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2952450.0, ans=0.1 2024-08-15 01:55:01,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2952550.0, ans=0.125 2024-08-15 01:55:09,280 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 01:55:19,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.348e+01 2.605e+01 2.891e+01 6.130e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-15 01:55:32,984 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 16 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 01:55:40,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2952750.0, ans=0.0 2024-08-15 01:55:44,270 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5450, loss[loss=0.1184, beats_loss=0.009725, ecapa_loss=0.000164, whisper_loss=0.1071, over 15495.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001528, whisper_loss=0.09022, over 3809309.66 frames. ], batch size: 64, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:55:50,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2952850.0, ans=0.0 2024-08-15 01:55:56,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2952850.0, ans=0.2 2024-08-15 01:56:04,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2952950.0, ans=0.1 2024-08-15 01:56:26,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2953050.0, ans=0.0 2024-08-15 01:56:33,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2953150.0, ans=0.0 2024-08-15 01:56:49,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2953250.0, ans=0.125 2024-08-15 01:57:03,883 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 01:57:06,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5500, loss[loss=0.1018, beats_loss=0.01, ecapa_loss=0.0001466, whisper_loss=0.09036, over 17721.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01068, ecapa_loss=0.0001518, whisper_loss=0.08941, over 3801671.18 frames. ], batch size: 68, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:57:31,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2953450.0, ans=0.1 2024-08-15 01:57:36,776 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 01:57:47,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-15 01:57:47,958 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 01:57:49,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-15 01:58:04,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.277e+01 2.522e+01 2.854e+01 1.046e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 01:58:27,360 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5550, loss[loss=0.0942, beats_loss=0.0102, ecapa_loss=0.0001419, whisper_loss=0.08258, over 19159.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001512, whisper_loss=0.09, over 3851089.32 frames. ], batch size: 73, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:58:32,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2953850.0, ans=0.0 2024-08-15 01:59:26,342 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 01:59:47,410 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5600, loss[loss=0.1001, beats_loss=0.01257, ecapa_loss=0.000161, whisper_loss=0.08594, over 13471.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01071, ecapa_loss=0.0001507, whisper_loss=0.08978, over 3818126.37 frames. ], batch size: 54, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:59:48,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2024-08-15 01:59:54,233 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 01:59:59,210 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 02:00:09,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2954450.0, ans=10.0 2024-08-15 02:00:10,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2954450.0, ans=0.2 2024-08-15 02:00:13,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2954450.0, ans=0.025 2024-08-15 02:00:14,809 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 02:00:17,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2954450.0, ans=0.2 2024-08-15 02:00:25,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-15 02:00:43,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.248e+01 2.477e+01 2.810e+01 7.862e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-15 02:00:51,441 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 02:00:56,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2954750.0, ans=0.2 2024-08-15 02:01:06,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5650, loss[loss=0.08498, beats_loss=0.01226, ecapa_loss=0.0001584, whisper_loss=0.07113, over 16140.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01069, ecapa_loss=0.0001503, whisper_loss=0.08986, over 3858678.26 frames. ], batch size: 68, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:01:10,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2954850.0, ans=0.0 2024-08-15 02:01:15,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-15 02:01:20,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2954950.0, ans=0.1 2024-08-15 02:01:28,016 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 02:01:30,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2954950.0, ans=0.125 2024-08-15 02:01:35,156 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 02:01:42,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2955050.0, ans=0.0 2024-08-15 02:01:53,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2955150.0, ans=0.125 2024-08-15 02:02:04,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2955250.0, ans=0.95 2024-08-15 02:02:18,071 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=12.0 2024-08-15 02:02:18,566 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5700, loss[loss=0.1217, beats_loss=0.008549, ecapa_loss=0.0001644, whisper_loss=0.1115, over 22635.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001517, whisper_loss=0.09034, over 3891823.80 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:02:20,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2955350.0, ans=0.125 2024-08-15 02:02:23,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-15 02:02:29,633 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:02:36,069 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 02:02:44,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2955550.0, ans=0.125 2024-08-15 02:02:45,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2955550.0, ans=0.125 2024-08-15 02:02:49,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2955550.0, ans=0.5 2024-08-15 02:02:51,946 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 02:02:59,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2955650.0, ans=0.0 2024-08-15 02:03:02,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2955650.0, ans=0.1 2024-08-15 02:03:06,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.504e+01 2.865e+01 3.291e+01 2.428e+02, threshold=5.731e+01, percent-clipped=5.0 2024-08-15 02:03:27,315 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5750, loss[loss=0.08996, beats_loss=0.01148, ecapa_loss=0.000172, whisper_loss=0.07675, over 21618.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001527, whisper_loss=0.09073, over 3886779.58 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:03:35,985 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 02:03:50,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2955950.0, ans=0.125 2024-08-15 02:03:52,719 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 02:03:55,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2956050.0, ans=0.125 2024-08-15 02:04:04,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=12.0 2024-08-15 02:04:17,415 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.39 vs. limit=6.0 2024-08-15 02:04:30,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2956250.0, ans=0.125 2024-08-15 02:04:35,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5800, loss[loss=0.09942, beats_loss=0.01042, ecapa_loss=0.0001499, whisper_loss=0.0875, over 16700.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001531, whisper_loss=0.09041, over 3863845.11 frames. ], batch size: 63, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:05:22,849 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 02:05:25,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.342e+01 2.666e+01 2.998e+01 4.632e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-15 02:05:29,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2956650.0, ans=0.0 2024-08-15 02:05:29,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2956650.0, ans=0.0 2024-08-15 02:05:37,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2956750.0, ans=0.125 2024-08-15 02:05:42,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2956750.0, ans=0.1 2024-08-15 02:05:45,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5850, loss[loss=0.1008, beats_loss=0.01136, ecapa_loss=0.0001306, whisper_loss=0.08812, over 19784.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.000152, whisper_loss=0.09014, over 3860612.56 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:05:56,981 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.142e+01 2024-08-15 02:05:58,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2024-08-15 02:06:01,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2956950.0, ans=0.125 2024-08-15 02:06:15,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2957050.0, ans=0.1 2024-08-15 02:06:16,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2957050.0, ans=0.125 2024-08-15 02:06:32,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2957150.0, ans=0.125 2024-08-15 02:06:33,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2957150.0, ans=0.1 2024-08-15 02:06:43,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2957250.0, ans=0.125 2024-08-15 02:06:44,756 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 02:06:58,330 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 02:06:59,392 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5900, loss[loss=0.1167, beats_loss=0.009607, ecapa_loss=0.0001608, whisper_loss=0.1055, over 14402.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001512, whisper_loss=0.09014, over 3889837.94 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:06:59,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2957350.0, ans=0.0 2024-08-15 02:07:04,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2957350.0, ans=0.07 2024-08-15 02:07:18,508 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 02:07:27,647 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 02:07:28,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-08-15 02:07:29,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2957550.0, ans=0.125 2024-08-15 02:07:31,877 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 02:07:37,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2957550.0, ans=0.1 2024-08-15 02:07:47,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2957650.0, ans=0.1 2024-08-15 02:07:47,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-15 02:07:51,960 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 02:07:54,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.296e+01 2.523e+01 2.887e+01 4.052e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-15 02:07:56,382 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 02:08:05,462 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 02:08:15,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 5950, loss[loss=0.1107, beats_loss=0.01126, ecapa_loss=0.0001502, whisper_loss=0.09796, over 20715.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.0001501, whisper_loss=0.08991, over 3904106.48 frames. ], batch size: 83, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:08:17,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2957850.0, ans=0.125 2024-08-15 02:08:21,106 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 02:08:25,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2957850.0, ans=0.2 2024-08-15 02:08:29,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2957950.0, ans=0.1 2024-08-15 02:08:37,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2957950.0, ans=0.0 2024-08-15 02:08:47,269 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 02:09:01,813 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 02:09:07,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2958150.0, ans=0.2 2024-08-15 02:09:40,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6000, loss[loss=0.09882, beats_loss=0.01041, ecapa_loss=0.000175, whisper_loss=0.08666, over 15628.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001509, whisper_loss=0.08988, over 3928918.26 frames. ], batch size: 62, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:09:40,778 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 02:10:47,984 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005526, whisper_loss=0.2479, over 922467.00 frames. 2024-08-15 02:11:14,065 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on SV_voxceleb1: loss=0.004315, beats_loss=0, ecapa_loss=0.0004315, whisper_loss=0, over 939242.00 frames. 2024-08-15 02:13:57,956 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9267, 1.6018, 2.0220, 0.9213], device='cuda:2') 2024-08-15 02:14:20,502 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 02:14:20,506 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 02:14:20,713 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 02:14:22,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.47 vs. limit=22.5 2024-08-15 02:14:41,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2958450.0, ans=0.07 2024-08-15 02:14:55,145 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 02:15:07,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2958550.0, ans=0.0 2024-08-15 02:15:08,957 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 24 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 02:15:16,884 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 02:15:17,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.230e+01 2.610e+01 2.865e+01 2.775e+02, threshold=5.221e+01, percent-clipped=3.0 2024-08-15 02:15:19,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2958650.0, ans=0.1 2024-08-15 02:15:27,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2958750.0, ans=0.125 2024-08-15 02:15:38,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6050, loss[loss=0.1426, beats_loss=0.007427, ecapa_loss=0.0001599, whisper_loss=0.1336, over 22145.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001518, whisper_loss=0.09166, over 3912228.24 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:15:38,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2958850.0, ans=0.125 2024-08-15 02:15:52,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2958950.0, ans=0.125 2024-08-15 02:16:16,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2959150.0, ans=0.1 2024-08-15 02:16:18,855 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 02:16:20,283 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 02:16:20,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2959150.0, ans=0.1 2024-08-15 02:16:28,996 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 14 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-15 02:16:29,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2959250.0, ans=15.0 2024-08-15 02:16:41,418 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 02:16:43,705 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6100, loss[loss=0.09309, beats_loss=0.01097, ecapa_loss=0.0001453, whisper_loss=0.08067, over 20713.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01069, ecapa_loss=0.0001517, whisper_loss=0.09145, over 3869779.34 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:16:53,111 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 02:16:57,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2959450.0, ans=0.125 2024-08-15 02:17:29,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.268e+01 2.472e+01 3.000e+01 1.337e+02, threshold=4.943e+01, percent-clipped=1.0 2024-08-15 02:17:31,013 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 02:17:33,552 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 02:17:39,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2959750.0, ans=0.125 2024-08-15 02:17:49,710 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6150, loss[loss=0.0977, beats_loss=0.01007, ecapa_loss=0.0001603, whisper_loss=0.08603, over 20000.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001527, whisper_loss=0.09158, over 3887539.52 frames. ], batch size: 81, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:17:50,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2959850.0, ans=0.2 2024-08-15 02:17:58,744 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09473193436861038, model_norm_threshold=49.43223190307617 2024-08-15 02:17:58,922 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.424e+04, grad_sumsq=2.424e+04, orig_rms_sq=1.000e+00 2024-08-15 02:18:08,401 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 02:18:24,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2960050.0, ans=0.1 2024-08-15 02:18:40,154 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 02:18:41,477 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-15 02:18:50,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2960250.0, ans=0.125 2024-08-15 02:19:01,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6200, loss[loss=0.1038, beats_loss=0.009958, ecapa_loss=0.0001204, whisper_loss=0.09266, over 19787.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.09168, over 3904798.42 frames. ], batch size: 73, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:19:08,127 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 02:19:19,036 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 02:19:19,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2960450.0, ans=0.2 2024-08-15 02:19:29,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2960550.0, ans=0.0 2024-08-15 02:19:49,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.384e+01 2.613e+01 3.033e+01 5.218e+02, threshold=5.226e+01, percent-clipped=4.0 2024-08-15 02:19:49,618 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 02:20:04,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2960750.0, ans=0.125 2024-08-15 02:20:08,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2960850.0, ans=0.125 2024-08-15 02:20:09,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6250, loss[loss=0.08356, beats_loss=0.01017, ecapa_loss=0.000155, whisper_loss=0.07184, over 14982.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001531, whisper_loss=0.09151, over 3900528.60 frames. ], batch size: 59, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:20:14,575 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 02:20:50,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2024-08-15 02:21:14,553 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-15 02:21:17,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6300, loss[loss=0.0966, beats_loss=0.01075, ecapa_loss=0.0001474, whisper_loss=0.08438, over 22919.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.000152, whisper_loss=0.09164, over 3892920.63 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:21:20,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2961350.0, ans=0.0 2024-08-15 02:21:37,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2961450.0, ans=0.125 2024-08-15 02:22:04,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.318e+01 2.564e+01 2.783e+01 4.377e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-15 02:22:23,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6350, loss[loss=0.1138, beats_loss=0.009803, ecapa_loss=0.0001391, whisper_loss=0.1026, over 20401.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001524, whisper_loss=0.09113, over 3878384.00 frames. ], batch size: 79, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:22:25,834 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:22:37,286 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-15 02:22:44,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2961950.0, ans=0.125 2024-08-15 02:22:55,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2962050.0, ans=0.07 2024-08-15 02:23:05,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-08-15 02:23:20,291 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 02:23:21,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2962250.0, ans=0.0 2024-08-15 02:23:29,210 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6400, loss[loss=0.08665, beats_loss=0.01293, ecapa_loss=0.000145, whisper_loss=0.07227, over 22382.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001524, whisper_loss=0.09137, over 3905199.25 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:23:35,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2962350.0, ans=0.0 2024-08-15 02:23:55,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2962550.0, ans=0.1 2024-08-15 02:24:14,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.481e+01 2.409e+01 2.750e+01 3.071e+01 4.179e+02, threshold=5.499e+01, percent-clipped=4.0 2024-08-15 02:24:16,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2962650.0, ans=0.05 2024-08-15 02:24:17,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2962650.0, ans=0.1 2024-08-15 02:24:33,200 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 02:24:34,427 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6450, loss[loss=0.1032, beats_loss=0.01004, ecapa_loss=0.0001288, whisper_loss=0.09184, over 14110.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.0001538, whisper_loss=0.09158, over 3924051.11 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:24:38,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2962850.0, ans=0.0 2024-08-15 02:24:59,943 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 02:25:03,839 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 02:25:11,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-15 02:25:18,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-15 02:25:19,629 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-15 02:25:25,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2963150.0, ans=0.125 2024-08-15 02:25:30,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2963250.0, ans=0.125 2024-08-15 02:25:32,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-15 02:25:35,004 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=12.0 2024-08-15 02:25:40,486 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6500, loss[loss=0.09217, beats_loss=0.01254, ecapa_loss=0.0001907, whisper_loss=0.07773, over 20306.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01064, ecapa_loss=0.0001538, whisper_loss=0.09205, over 3918387.89 frames. ], batch size: 87, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:26:07,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2963550.0, ans=0.125 2024-08-15 02:26:08,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-15 02:26:08,847 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 02:26:10,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2963550.0, ans=0.1 2024-08-15 02:26:16,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2963550.0, ans=0.125 2024-08-15 02:26:19,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2963650.0, ans=0.04949747468305833 2024-08-15 02:26:26,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.539e+01 2.761e+01 6.579e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-15 02:26:31,119 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 02:26:33,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-15 02:26:46,362 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6550, loss[loss=0.1133, beats_loss=0.01047, ecapa_loss=0.000162, whisper_loss=0.1013, over 21326.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001532, whisper_loss=0.09173, over 3941400.28 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:26:47,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2963850.0, ans=0.125 2024-08-15 02:26:55,948 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 02:27:01,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-15 02:27:04,678 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 02:27:06,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2963950.0, ans=0.0 2024-08-15 02:27:08,777 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 02:27:13,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2024-08-15 02:27:24,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2964150.0, ans=0.95 2024-08-15 02:27:39,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2024-08-15 02:27:43,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-15 02:27:45,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2964250.0, ans=0.125 2024-08-15 02:27:50,830 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6600, loss[loss=0.09857, beats_loss=0.01177, ecapa_loss=0.000152, whisper_loss=0.08528, over 23215.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.000154, whisper_loss=0.0917, over 3936532.29 frames. ], batch size: 96, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:27:59,618 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 02:28:25,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2964550.0, ans=0.05 2024-08-15 02:28:28,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2964650.0, ans=0.0 2024-08-15 02:28:31,071 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 02:28:35,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.343e+01 2.654e+01 2.960e+01 4.414e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-15 02:28:42,769 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 02:28:55,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6650, loss[loss=0.1058, beats_loss=0.01198, ecapa_loss=0.0001512, whisper_loss=0.09233, over 22401.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01062, ecapa_loss=0.0001548, whisper_loss=0.09201, over 3936214.66 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:29:30,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2965050.0, ans=0.125 2024-08-15 02:29:31,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2965050.0, ans=0.125 2024-08-15 02:29:38,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2965150.0, ans=0.1 2024-08-15 02:29:39,464 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 02:29:42,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-15 02:29:43,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2965150.0, ans=0.1 2024-08-15 02:29:49,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.20 vs. limit=6.0 2024-08-15 02:29:55,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2965250.0, ans=0.125 2024-08-15 02:29:56,865 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 02:30:01,849 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6700, loss[loss=0.1146, beats_loss=0.008951, ecapa_loss=0.0001475, whisper_loss=0.1041, over 15003.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.000154, whisper_loss=0.09209, over 3914299.27 frames. ], batch size: 60, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:30:05,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-15 02:30:12,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-15 02:30:13,235 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 02:30:14,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2965450.0, ans=0.0 2024-08-15 02:30:32,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2965550.0, ans=0.1 2024-08-15 02:30:33,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2965550.0, ans=0.125 2024-08-15 02:30:34,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2965550.0, ans=0.2 2024-08-15 02:30:35,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2965550.0, ans=0.125 2024-08-15 02:30:39,806 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 02:30:42,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2965650.0, ans=0.1 2024-08-15 02:30:51,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.368e+01 2.669e+01 2.995e+01 9.040e+01, threshold=5.338e+01, percent-clipped=3.0 2024-08-15 02:30:52,065 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 02:31:07,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=12.0 2024-08-15 02:31:08,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2965750.0, ans=0.0 2024-08-15 02:31:14,447 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6750, loss[loss=0.1065, beats_loss=0.01054, ecapa_loss=0.0001627, whisper_loss=0.09434, over 14183.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001549, whisper_loss=0.09171, over 3871011.52 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:31:41,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2965950.0, ans=0.125 2024-08-15 02:32:07,469 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 02:32:10,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2966150.0, ans=0.125 2024-08-15 02:32:18,714 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.788e-02 2024-08-15 02:32:29,972 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6800, loss[loss=0.0836, beats_loss=0.01074, ecapa_loss=0.0001984, whisper_loss=0.07087, over 21067.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001547, whisper_loss=0.09105, over 3868314.44 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:32:36,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2966350.0, ans=0.1 2024-08-15 02:32:50,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2966450.0, ans=0.2 2024-08-15 02:32:54,813 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 02:32:58,730 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 02:33:02,309 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.196e+05 2024-08-15 02:33:03,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2966550.0, ans=0.125 2024-08-15 02:33:11,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2966550.0, ans=0.125 2024-08-15 02:33:18,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2024-08-15 02:33:25,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.273e+01 2.576e+01 2.834e+01 4.792e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-15 02:33:28,624 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 02:33:32,883 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 02:33:34,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2966750.0, ans=0.07 2024-08-15 02:33:38,873 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 02:33:46,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6850, loss[loss=0.08909, beats_loss=0.01239, ecapa_loss=0.0001146, whisper_loss=0.07555, over 21351.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001537, whisper_loss=0.09043, over 3854049.95 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:33:51,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2966850.0, ans=0.0 2024-08-15 02:34:16,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2024-08-15 02:34:29,867 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 02:34:31,120 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 02:34:37,972 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 02:34:41,505 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 02:34:45,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-15 02:35:05,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6900, loss[loss=0.1305, beats_loss=0.009287, ecapa_loss=0.000151, whisper_loss=0.1197, over 22724.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001531, whisper_loss=0.09042, over 3829398.76 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:35:09,031 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 02:35:39,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2967550.0, ans=0.125 2024-08-15 02:35:40,254 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 02:35:44,980 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-15 02:35:51,048 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 32 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 02:36:02,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.306e+01 2.522e+01 2.757e+01 3.704e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-15 02:36:07,180 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 02:36:09,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-15 02:36:18,146 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 02:36:24,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 6950, loss[loss=0.08131, beats_loss=0.01101, ecapa_loss=0.0001976, whisper_loss=0.06832, over 18658.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001536, whisper_loss=0.08996, over 3806897.46 frames. ], batch size: 83, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:36:24,290 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 02:36:25,977 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 02:36:28,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2967850.0, ans=0.05 2024-08-15 02:36:33,918 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 02:36:46,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2967950.0, ans=0.0 2024-08-15 02:37:02,587 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 13 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 02:37:22,464 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 02:37:40,289 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7000, loss[loss=0.07019, beats_loss=0.01291, ecapa_loss=0.0001861, whisper_loss=0.05542, over 18308.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001534, whisper_loss=0.09004, over 3822910.71 frames. ], batch size: 80, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:37:46,339 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-15 02:38:33,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2968650.0, ans=0.0 2024-08-15 02:38:38,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.236e+01 2.500e+01 2.764e+01 4.319e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-15 02:38:42,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2968650.0, ans=0.2 2024-08-15 02:38:46,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2968750.0, ans=0.125 2024-08-15 02:38:54,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2968750.0, ans=0.1 2024-08-15 02:38:57,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2968750.0, ans=0.0 2024-08-15 02:39:00,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7050, loss[loss=0.1112, beats_loss=0.009635, ecapa_loss=0.0001416, whisper_loss=0.1002, over 21788.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001531, whisper_loss=0.09067, over 3851350.16 frames. ], batch size: 87, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:39:10,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.82 vs. limit=6.0 2024-08-15 02:39:11,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2968850.0, ans=0.04949747468305833 2024-08-15 02:39:17,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2968950.0, ans=0.0 2024-08-15 02:39:29,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2968950.0, ans=0.1 2024-08-15 02:39:40,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.87 vs. limit=5.0 2024-08-15 02:39:49,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2969150.0, ans=0.125 2024-08-15 02:39:58,972 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 02:40:20,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7100, loss[loss=0.08174, beats_loss=0.01071, ecapa_loss=0.0001714, whisper_loss=0.06932, over 14159.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01075, ecapa_loss=0.0001517, whisper_loss=0.09016, over 3822845.49 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:40:22,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2969350.0, ans=6.0 2024-08-15 02:40:27,961 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 02:40:36,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.56 vs. limit=22.5 2024-08-15 02:40:53,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.65 vs. limit=10.0 2024-08-15 02:41:01,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-15 02:41:05,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2969550.0, ans=0.1 2024-08-15 02:41:11,714 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 02:41:15,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2969650.0, ans=0.0 2024-08-15 02:41:19,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.324e+01 2.523e+01 2.719e+01 3.184e+02, threshold=5.045e+01, percent-clipped=4.0 2024-08-15 02:41:42,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7150, loss[loss=0.1148, beats_loss=0.007778, ecapa_loss=0.0001832, whisper_loss=0.1052, over 21644.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.000151, whisper_loss=0.09102, over 3855712.21 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:41:43,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-08-15 02:42:06,615 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 31 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 02:42:07,858 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 02:42:11,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2969950.0, ans=0.125 2024-08-15 02:42:13,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2970050.0, ans=0.0 2024-08-15 02:42:16,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2970050.0, ans=0.1 2024-08-15 02:42:16,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2970050.0, ans=0.125 2024-08-15 02:42:22,102 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 40 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 02:42:28,733 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-15 02:42:40,257 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 02:42:51,251 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.842e+00 2024-08-15 02:42:52,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2970250.0, ans=0.04949747468305833 2024-08-15 02:43:03,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7200, loss[loss=0.09549, beats_loss=0.009702, ecapa_loss=0.0001609, whisper_loss=0.08418, over 18936.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001521, whisper_loss=0.09132, over 3871815.10 frames. ], batch size: 77, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:43:03,935 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 02:43:19,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2970450.0, ans=0.1 2024-08-15 02:43:27,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2970450.0, ans=0.0 2024-08-15 02:43:31,491 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 02:43:57,687 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 02:44:03,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.341e+01 2.613e+01 2.912e+01 4.502e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-15 02:44:24,541 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7250, loss[loss=0.1208, beats_loss=0.01219, ecapa_loss=0.0001275, whisper_loss=0.1073, over 23077.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001524, whisper_loss=0.09081, over 3905710.80 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:44:28,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2970850.0, ans=0.125 2024-08-15 02:44:33,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-15 02:44:41,980 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 02:44:46,374 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 02:45:01,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-08-15 02:45:09,571 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 02:45:16,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2971150.0, ans=0.2 2024-08-15 02:45:24,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2971150.0, ans=0.0 2024-08-15 02:45:41,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2971250.0, ans=0.025 2024-08-15 02:45:45,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2971250.0, ans=0.0 2024-08-15 02:45:47,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7300, loss[loss=0.1275, beats_loss=0.008453, ecapa_loss=0.0001538, whisper_loss=0.1176, over 23417.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01066, ecapa_loss=0.0001524, whisper_loss=0.09184, over 3910862.94 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:46:11,747 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 02:46:36,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2971650.0, ans=0.125 2024-08-15 02:46:46,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.342e+01 2.606e+01 2.963e+01 2.884e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 02:46:52,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2971750.0, ans=0.125 2024-08-15 02:47:07,060 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.688e-02 2024-08-15 02:47:09,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7350, loss[loss=0.1171, beats_loss=0.01206, ecapa_loss=0.0001575, whisper_loss=0.1035, over 22640.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001517, whisper_loss=0.09114, over 3876112.28 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:47:20,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2971850.0, ans=0.125 2024-08-15 02:47:21,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2971850.0, ans=0.125 2024-08-15 02:47:30,231 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 02:47:35,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-15 02:47:39,819 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 02:47:47,839 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 02:47:48,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=8.0 2024-08-15 02:47:58,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2972150.0, ans=0.2 2024-08-15 02:48:23,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2972250.0, ans=0.125 2024-08-15 02:48:31,241 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 02:48:32,594 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7400, loss[loss=0.09743, beats_loss=0.0122, ecapa_loss=0.0001401, whisper_loss=0.08383, over 17736.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001517, whisper_loss=0.09079, over 3869857.86 frames. ], batch size: 72, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:48:33,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2972350.0, ans=0.125 2024-08-15 02:48:34,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2972350.0, ans=0.1 2024-08-15 02:49:03,539 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 35 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 02:49:23,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2972650.0, ans=0.0 2024-08-15 02:49:26,900 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 02:49:31,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.322e+01 2.605e+01 2.983e+01 4.527e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 02:49:34,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2972650.0, ans=0.0 2024-08-15 02:49:42,286 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 02:49:43,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2024-08-15 02:49:53,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7450, loss[loss=0.1011, beats_loss=0.01128, ecapa_loss=0.0001461, whisper_loss=0.08841, over 21686.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001527, whisper_loss=0.09082, over 3875323.40 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:50:15,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2972950.0, ans=0.125 2024-08-15 02:50:16,383 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 02:50:24,908 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-15 02:50:28,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2973050.0, ans=0.125 2024-08-15 02:50:29,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2973050.0, ans=15.0 2024-08-15 02:50:30,781 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-15 02:50:47,089 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 02:50:55,548 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 02:50:57,696 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:50:59,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2973250.0, ans=0.125 2024-08-15 02:51:16,925 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7500, loss[loss=0.09971, beats_loss=0.01087, ecapa_loss=0.0001804, whisper_loss=0.08704, over 14934.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001526, whisper_loss=0.09061, over 3884695.86 frames. ], batch size: 65, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:51:21,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2973350.0, ans=0.0 2024-08-15 02:51:26,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2973350.0, ans=0.0 2024-08-15 02:51:27,623 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 02:51:42,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2973450.0, ans=0.125 2024-08-15 02:51:50,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2973550.0, ans=0.125 2024-08-15 02:52:15,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.356e+01 2.622e+01 2.952e+01 4.347e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-15 02:52:17,466 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-15 02:52:18,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-15 02:52:19,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2973650.0, ans=0.125 2024-08-15 02:52:19,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2973650.0, ans=0.2 2024-08-15 02:52:22,965 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 02:52:34,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2973750.0, ans=0.125 2024-08-15 02:52:39,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7550, loss[loss=0.1051, beats_loss=0.009628, ecapa_loss=0.0001476, whisper_loss=0.09399, over 22816.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001535, whisper_loss=0.09093, over 3875892.99 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:53:07,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2973950.0, ans=0.1 2024-08-15 02:53:58,681 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7600, loss[loss=0.1107, beats_loss=0.01029, ecapa_loss=0.0001523, whisper_loss=0.09886, over 22588.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001531, whisper_loss=0.09036, over 3835455.93 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:54:02,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2974350.0, ans=0.0 2024-08-15 02:54:06,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2974350.0, ans=0.95 2024-08-15 02:54:06,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2974350.0, ans=0.125 2024-08-15 02:54:14,007 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 02:54:19,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2974450.0, ans=0.0 2024-08-15 02:54:22,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2974450.0, ans=0.1 2024-08-15 02:54:29,666 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 02:54:30,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2974550.0, ans=0.07 2024-08-15 02:54:34,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2974550.0, ans=0.1 2024-08-15 02:54:45,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2974650.0, ans=0.0 2024-08-15 02:54:52,954 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 02:54:53,282 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:54:53,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2974650.0, ans=0.0 2024-08-15 02:54:53,813 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-08-15 02:54:55,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.311e+01 2.587e+01 3.162e+01 4.205e+02, threshold=5.175e+01, percent-clipped=3.0 2024-08-15 02:55:03,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2974750.0, ans=0.0 2024-08-15 02:55:12,288 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 41 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 02:55:17,922 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7650, loss[loss=0.08737, beats_loss=0.009465, ecapa_loss=0.0001786, whisper_loss=0.07612, over 20193.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001532, whisper_loss=0.09027, over 3820193.87 frames. ], batch size: 82, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:55:18,117 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 02:55:26,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2974850.0, ans=0.05 2024-08-15 02:55:34,060 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 02:56:35,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7700, loss[loss=0.104, beats_loss=0.01008, ecapa_loss=0.0001342, whisper_loss=0.09259, over 19305.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.000153, whisper_loss=0.09133, over 3867664.12 frames. ], batch size: 73, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:56:48,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2975350.0, ans=0.0 2024-08-15 02:56:52,323 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 02:57:24,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2975650.0, ans=0.0 2024-08-15 02:57:31,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.248e+01 2.489e+01 2.817e+01 2.674e+02, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 02:57:33,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2975650.0, ans=0.1 2024-08-15 02:57:46,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2975750.0, ans=0.125 2024-08-15 02:57:52,833 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7750, loss[loss=0.1261, beats_loss=0.008996, ecapa_loss=0.0001793, whisper_loss=0.1153, over 23043.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001529, whisper_loss=0.09101, over 3869947.94 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:57:53,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2975850.0, ans=0.1 2024-08-15 02:58:00,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2975850.0, ans=0.05 2024-08-15 02:58:05,743 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 02:58:09,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2975950.0, ans=0.2 2024-08-15 02:58:18,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2975950.0, ans=0.1 2024-08-15 02:58:30,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2976050.0, ans=0.0 2024-08-15 02:58:31,131 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 02:58:37,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2976050.0, ans=0.0 2024-08-15 02:58:43,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2976150.0, ans=0.0 2024-08-15 02:58:50,714 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 02:59:06,975 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 02:59:09,842 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7800, loss[loss=0.09084, beats_loss=0.01025, ecapa_loss=0.0001524, whisper_loss=0.07907, over 20097.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001526, whisper_loss=0.09082, over 3863558.20 frames. ], batch size: 78, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:59:10,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2976350.0, ans=0.125 2024-08-15 02:59:22,477 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 02:59:53,410 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.106e+00 2024-08-15 02:59:53,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2976550.0, ans=0.125 2024-08-15 03:00:00,425 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 03:00:03,224 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 03:00:06,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.382e+01 2.617e+01 2.970e+01 1.321e+02, threshold=5.235e+01, percent-clipped=4.0 2024-08-15 03:00:30,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7850, loss[loss=0.1055, beats_loss=0.01239, ecapa_loss=0.0001228, whisper_loss=0.09192, over 17346.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001522, whisper_loss=0.09083, over 3865685.39 frames. ], batch size: 69, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:00:36,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2976850.0, ans=0.05 2024-08-15 03:00:36,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2976850.0, ans=0.0 2024-08-15 03:00:38,743 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 03:01:19,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2977150.0, ans=0.125 2024-08-15 03:01:21,360 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 03:01:22,963 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 03:01:25,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-15 03:01:27,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2977150.0, ans=0.125 2024-08-15 03:01:45,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=22.5 2024-08-15 03:01:53,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7900, loss[loss=0.1193, beats_loss=0.01101, ecapa_loss=0.0001498, whisper_loss=0.1068, over 17909.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001518, whisper_loss=0.09046, over 3856131.51 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:02:05,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2977350.0, ans=0.1 2024-08-15 03:02:14,107 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 03:02:23,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2977450.0, ans=0.125 2024-08-15 03:02:53,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.325e+01 2.726e+01 3.089e+01 1.885e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-15 03:03:05,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2977750.0, ans=0.1 2024-08-15 03:03:07,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2977750.0, ans=0.0 2024-08-15 03:03:12,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2977750.0, ans=0.2 2024-08-15 03:03:15,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 7950, loss[loss=0.1055, beats_loss=0.009768, ecapa_loss=0.00014, whisper_loss=0.09436, over 22350.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.000152, whisper_loss=0.09047, over 3866563.13 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:03:18,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-08-15 03:03:22,404 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 03:03:24,125 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 03:03:42,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-08-15 03:03:43,172 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 03:03:52,602 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-15 03:03:54,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2978050.0, ans=0.0 2024-08-15 03:04:37,483 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8000, loss[loss=0.1106, beats_loss=0.00978, ecapa_loss=0.0001392, whisper_loss=0.09939, over 21900.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001517, whisper_loss=0.09098, over 3863973.45 frames. ], batch size: 84, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:04:42,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2978350.0, ans=0.0 2024-08-15 03:04:57,091 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 03:05:16,187 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-15 03:05:18,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2978550.0, ans=0.125 2024-08-15 03:05:29,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2978650.0, ans=0.2 2024-08-15 03:05:35,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.399e+01 2.701e+01 3.155e+01 4.080e+02, threshold=5.401e+01, percent-clipped=3.0 2024-08-15 03:05:36,077 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 03:05:41,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2978750.0, ans=0.0 2024-08-15 03:05:41,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2978750.0, ans=0.125 2024-08-15 03:05:56,352 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 03:05:57,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8050, loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001289, whisper_loss=0.09262, over 16442.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.000152, whisper_loss=0.09081, over 3878486.79 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:06:16,215 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 03:06:26,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2978950.0, ans=0.125 2024-08-15 03:06:36,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2979050.0, ans=0.125 2024-08-15 03:06:50,431 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 03:06:54,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2979150.0, ans=0.125 2024-08-15 03:07:06,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2979250.0, ans=0.125 2024-08-15 03:07:16,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2979350.0, ans=0.125 2024-08-15 03:07:17,148 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8100, loss[loss=0.0946, beats_loss=0.01302, ecapa_loss=0.0001477, whisper_loss=0.0801, over 20966.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.000152, whisper_loss=0.09058, over 3853324.91 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:07:46,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2979450.0, ans=0.0 2024-08-15 03:07:53,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2979550.0, ans=0.125 2024-08-15 03:07:57,108 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 30 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 03:08:06,721 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-15 03:08:13,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2979650.0, ans=0.2 2024-08-15 03:08:16,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.318e+01 2.516e+01 2.878e+01 5.938e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-15 03:08:23,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-15 03:08:27,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2979750.0, ans=0.125 2024-08-15 03:08:37,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2979850.0, ans=0.125 2024-08-15 03:08:38,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8150, loss[loss=0.1139, beats_loss=0.01032, ecapa_loss=0.0001439, whisper_loss=0.1021, over 17072.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09124, over 3892059.70 frames. ], batch size: 65, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:08:50,081 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 03:08:57,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-15 03:09:14,745 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 03:09:17,759 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-15 03:09:25,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2980050.0, ans=0.125 2024-08-15 03:09:31,359 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 03:10:00,818 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8200, loss[loss=0.1146, beats_loss=0.01081, ecapa_loss=0.0001801, whisper_loss=0.102, over 23208.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001531, whisper_loss=0.09126, over 3917935.71 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:10:02,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-15 03:10:05,109 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 03:10:23,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2980450.0, ans=0.125 2024-08-15 03:10:33,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2980550.0, ans=0.0 2024-08-15 03:10:34,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=12.0 2024-08-15 03:10:35,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2980550.0, ans=0.125 2024-08-15 03:10:41,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2980550.0, ans=0.0 2024-08-15 03:10:43,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2980550.0, ans=0.05 2024-08-15 03:10:43,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2980550.0, ans=0.125 2024-08-15 03:10:58,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2980650.0, ans=0.1 2024-08-15 03:11:00,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.326e+01 2.553e+01 2.974e+01 4.367e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 03:11:23,106 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8250, loss[loss=0.08361, beats_loss=0.0115, ecapa_loss=0.0001515, whisper_loss=0.07059, over 17591.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.09073, over 3917050.47 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:11:26,548 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-15 03:11:29,204 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 03:12:47,528 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8300, loss[loss=0.08457, beats_loss=0.01072, ecapa_loss=0.0001728, whisper_loss=0.07212, over 20514.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001524, whisper_loss=0.08997, over 3907070.25 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:12:58,829 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.67 vs. limit=22.5 2024-08-15 03:13:08,777 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 03:13:28,135 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 03:13:28,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2981550.0, ans=0.1 2024-08-15 03:13:29,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2024-08-15 03:13:35,170 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 03:13:47,543 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.355e+01 2.573e+01 2.835e+01 6.620e+01, threshold=5.146e+01, percent-clipped=1.0 2024-08-15 03:14:08,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2981750.0, ans=0.125 2024-08-15 03:14:08,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2981750.0, ans=0.125 2024-08-15 03:14:11,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8350, loss[loss=0.1006, beats_loss=0.01231, ecapa_loss=0.0001457, whisper_loss=0.08682, over 21690.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001529, whisper_loss=0.0907, over 3924842.56 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:14:17,513 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-15 03:14:31,337 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 03:14:40,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2981950.0, ans=0.2 2024-08-15 03:14:48,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2982050.0, ans=0.07 2024-08-15 03:15:06,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2982150.0, ans=0.125 2024-08-15 03:15:08,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2982150.0, ans=0.125 2024-08-15 03:15:10,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2982150.0, ans=0.0 2024-08-15 03:15:21,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2982250.0, ans=0.07 2024-08-15 03:15:24,569 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 03:15:32,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.49 vs. limit=10.0 2024-08-15 03:15:34,325 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8400, loss[loss=0.1184, beats_loss=0.008968, ecapa_loss=0.0001488, whisper_loss=0.1079, over 22220.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.000154, whisper_loss=0.09126, over 3937066.59 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:15:44,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-08-15 03:15:53,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=12.0 2024-08-15 03:15:53,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-15 03:15:55,384 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 03:16:11,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2982550.0, ans=0.125 2024-08-15 03:16:15,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2982550.0, ans=0.2 2024-08-15 03:16:17,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2982550.0, ans=0.1 2024-08-15 03:16:18,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2982550.0, ans=0.1 2024-08-15 03:16:21,233 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 03:16:35,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2982650.0, ans=0.125 2024-08-15 03:16:36,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.324e+01 2.482e+01 2.790e+01 5.297e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-15 03:16:36,742 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 03:16:40,402 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 03:16:45,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-08-15 03:16:54,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2982750.0, ans=0.0 2024-08-15 03:17:02,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8450, loss[loss=0.1119, beats_loss=0.008805, ecapa_loss=0.0001388, whisper_loss=0.1017, over 18053.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001531, whisper_loss=0.09138, over 3907176.35 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:17:30,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2982950.0, ans=0.09899494936611666 2024-08-15 03:17:43,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2983050.0, ans=0.1 2024-08-15 03:18:02,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2983150.0, ans=0.125 2024-08-15 03:18:07,148 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 03:18:08,727 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 03:18:12,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2983250.0, ans=0.125 2024-08-15 03:18:18,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2983250.0, ans=0.0 2024-08-15 03:18:22,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2983350.0, ans=0.0 2024-08-15 03:18:23,513 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8500, loss[loss=0.09369, beats_loss=0.01119, ecapa_loss=0.0001415, whisper_loss=0.08108, over 16887.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001524, whisper_loss=0.09095, over 3873236.68 frames. ], batch size: 66, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:18:36,841 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 03:19:10,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2983650.0, ans=0.2 2024-08-15 03:19:14,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-15 03:19:19,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2983650.0, ans=0.0 2024-08-15 03:19:21,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.305e+01 2.558e+01 2.940e+01 1.198e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-15 03:19:28,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2983750.0, ans=0.125 2024-08-15 03:19:32,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2983750.0, ans=0.125 2024-08-15 03:19:39,273 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 03:19:46,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8550, loss[loss=0.1058, beats_loss=0.01028, ecapa_loss=0.0001725, whisper_loss=0.09377, over 22093.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001526, whisper_loss=0.09066, over 3892513.54 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:19:49,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2983850.0, ans=0.2 2024-08-15 03:20:08,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-15 03:20:33,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2984150.0, ans=0.125 2024-08-15 03:20:55,417 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 03:21:07,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8600, loss[loss=0.1056, beats_loss=0.01073, ecapa_loss=0.00014, whisper_loss=0.09349, over 18015.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.000152, whisper_loss=0.09125, over 3915654.64 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:21:09,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2984350.0, ans=0.125 2024-08-15 03:21:11,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2984350.0, ans=0.0 2024-08-15 03:21:24,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-08-15 03:21:33,360 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 03:21:34,701 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-15 03:21:40,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2984550.0, ans=0.1 2024-08-15 03:21:51,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2984550.0, ans=0.2 2024-08-15 03:22:06,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.403e+01 2.689e+01 2.856e+01 3.948e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-15 03:22:09,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2984650.0, ans=0.1 2024-08-15 03:22:16,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2984750.0, ans=0.0 2024-08-15 03:22:18,526 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 03:22:27,693 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 03:22:29,094 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8650, loss[loss=0.09023, beats_loss=0.01197, ecapa_loss=0.0001354, whisper_loss=0.07691, over 20575.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001526, whisper_loss=0.091, over 3889845.21 frames. ], batch size: 83, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:22:48,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2984950.0, ans=0.0 2024-08-15 03:23:50,555 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 15 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-15 03:23:55,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8700, loss[loss=0.1115, beats_loss=0.01018, ecapa_loss=0.0001342, whisper_loss=0.1, over 22151.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001524, whisper_loss=0.09024, over 3884009.06 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:24:08,991 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 03:24:11,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2985350.0, ans=0.125 2024-08-15 03:24:12,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2985350.0, ans=0.2 2024-08-15 03:24:19,595 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 03:24:21,994 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 03:24:45,547 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 03:25:05,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=8.0 2024-08-15 03:25:05,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.433e+01 2.657e+01 2.885e+01 1.161e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-15 03:25:14,153 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 03:25:19,450 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-15 03:25:21,648 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 03:25:30,697 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8750, loss[loss=0.1031, beats_loss=0.01081, ecapa_loss=0.000184, whisper_loss=0.09044, over 13882.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001516, whisper_loss=0.09026, over 3868744.76 frames. ], batch size: 56, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:25:39,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2985850.0, ans=0.125 2024-08-15 03:26:42,772 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 03:26:46,669 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 03:26:58,426 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-15 03:27:02,093 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8800, loss[loss=0.1035, beats_loss=0.01246, ecapa_loss=0.0001196, whisper_loss=0.08986, over 19087.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001506, whisper_loss=0.0904, over 3835480.86 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:27:19,187 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 03:27:25,005 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 03:27:25,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2986450.0, ans=0.125 2024-08-15 03:27:33,243 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 03:27:36,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2986550.0, ans=0.125 2024-08-15 03:27:42,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2986550.0, ans=15.0 2024-08-15 03:28:00,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2986650.0, ans=6.0 2024-08-15 03:28:05,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.266e+01 2.513e+01 2.884e+01 4.202e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 03:28:13,289 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 03:28:24,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2986750.0, ans=0.125 2024-08-15 03:28:26,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2986750.0, ans=0.035 2024-08-15 03:28:28,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8850, loss[loss=0.07894, beats_loss=0.0121, ecapa_loss=0.0001557, whisper_loss=0.06528, over 16815.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001505, whisper_loss=0.09035, over 3846792.43 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:28:29,164 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 03:29:27,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2987150.0, ans=0.0 2024-08-15 03:29:43,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2024-08-15 03:29:53,799 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 03:29:56,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8900, loss[loss=0.08997, beats_loss=0.01317, ecapa_loss=0.0001504, whisper_loss=0.0753, over 15518.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001504, whisper_loss=0.09079, over 3825791.43 frames. ], batch size: 62, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:30:28,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2987450.0, ans=0.0 2024-08-15 03:30:33,606 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 03:30:37,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2987550.0, ans=0.0 2024-08-15 03:30:38,609 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 03:30:57,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2987650.0, ans=0.125 2024-08-15 03:31:01,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.296e+01 2.671e+01 2.935e+01 5.477e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-15 03:31:05,173 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 03:31:27,912 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 8950, loss[loss=0.1173, beats_loss=0.007491, ecapa_loss=0.0001612, whisper_loss=0.1082, over 16262.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001511, whisper_loss=0.09048, over 3814201.74 frames. ], batch size: 63, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:31:39,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2987850.0, ans=0.0 2024-08-15 03:32:37,650 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 03:32:59,209 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9000, loss[loss=0.08704, beats_loss=0.01366, ecapa_loss=0.0001367, whisper_loss=0.07201, over 22010.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01079, ecapa_loss=0.0001508, whisper_loss=0.08941, over 3830405.47 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:32:59,211 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 03:33:42,740 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005419, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 03:34:03,157 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on SV_voxceleb1: loss=0.004236, beats_loss=0, ecapa_loss=0.0004236, whisper_loss=0, over 939242.00 frames. 2024-08-15 03:35:55,439 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 03:35:55,443 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 03:36:20,704 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 03:36:25,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2988450.0, ans=0.125 2024-08-15 03:36:28,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2024-08-15 03:36:51,723 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 03:37:00,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.331e+01 2.598e+01 2.772e+01 4.416e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 03:37:22,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9050, loss[loss=0.09543, beats_loss=0.01281, ecapa_loss=0.0001382, whisper_loss=0.08124, over 17234.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01079, ecapa_loss=0.0001497, whisper_loss=0.08973, over 3865991.58 frames. ], batch size: 70, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:37:49,135 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 03:37:55,345 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 03:37:59,615 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-15 03:38:02,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2989050.0, ans=0.0 2024-08-15 03:38:26,052 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 03:38:32,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2989150.0, ans=0.0 2024-08-15 03:38:38,469 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 03:38:45,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2989250.0, ans=0.125 2024-08-15 03:38:56,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9100, loss[loss=0.1088, beats_loss=0.009047, ecapa_loss=0.0001583, whisper_loss=0.09816, over 17373.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01078, ecapa_loss=0.0001515, whisper_loss=0.08944, over 3850530.26 frames. ], batch size: 66, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:38:56,947 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 03:39:07,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-15 03:39:25,037 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 03:39:27,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2989450.0, ans=0.05 2024-08-15 03:39:33,077 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 03:39:39,936 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 03:39:49,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2989550.0, ans=0.125 2024-08-15 03:39:58,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2989650.0, ans=0.125 2024-08-15 03:40:10,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.390e+01 2.731e+01 3.078e+01 3.225e+02, threshold=5.461e+01, percent-clipped=2.0 2024-08-15 03:40:29,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-15 03:40:34,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2989850.0, ans=0.1 2024-08-15 03:40:35,398 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9150, loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001223, whisper_loss=0.09069, over 17581.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.0001512, whisper_loss=0.09008, over 3903935.12 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:40:50,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-15 03:40:53,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2989950.0, ans=0.125 2024-08-15 03:41:12,230 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 03:41:30,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=12.0 2024-08-15 03:41:43,432 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 03:41:49,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2990250.0, ans=0.025 2024-08-15 03:41:50,496 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 03:42:03,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2990350.0, ans=0.1 2024-08-15 03:42:04,724 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9200, loss[loss=0.1211, beats_loss=0.01025, ecapa_loss=0.0001731, whisper_loss=0.1091, over 16332.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001519, whisper_loss=0.08984, over 3906508.58 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:42:09,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2990350.0, ans=0.2 2024-08-15 03:42:11,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2990350.0, ans=0.1 2024-08-15 03:42:13,380 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 03:42:19,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2990350.0, ans=0.1 2024-08-15 03:42:26,833 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 03:42:33,547 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 03:42:52,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2990550.0, ans=0.0 2024-08-15 03:43:08,337 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 03:43:12,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.340e+01 2.560e+01 2.896e+01 2.197e+02, threshold=5.119e+01, percent-clipped=4.0 2024-08-15 03:43:17,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-08-15 03:43:20,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2990750.0, ans=0.1 2024-08-15 03:43:28,755 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 31 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 03:43:35,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9250, loss[loss=0.09445, beats_loss=0.01211, ecapa_loss=0.0001768, whisper_loss=0.08057, over 19902.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01076, ecapa_loss=0.0001537, whisper_loss=0.09005, over 3931637.33 frames. ], batch size: 82, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:43:55,047 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 03:43:57,951 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 03:44:11,728 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 03:44:15,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2991050.0, ans=0.1 2024-08-15 03:44:29,030 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 03:44:45,436 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 03:44:51,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2991250.0, ans=0.0 2024-08-15 03:45:01,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2991250.0, ans=0.2 2024-08-15 03:45:02,717 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 03:45:06,022 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 03:45:08,480 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9300, loss[loss=0.1031, beats_loss=0.009929, ecapa_loss=0.000186, whisper_loss=0.09131, over 15317.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.0001537, whisper_loss=0.08993, over 3946958.24 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:45:34,723 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 03:45:45,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2991550.0, ans=0.0 2024-08-15 03:46:00,424 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 03:46:18,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.307e+01 2.589e+01 2.834e+01 3.793e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-15 03:46:42,683 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9350, loss[loss=0.1033, beats_loss=0.01222, ecapa_loss=0.0001234, whisper_loss=0.0898, over 14980.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0108, ecapa_loss=0.0001523, whisper_loss=0.08919, over 3912108.78 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:46:48,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2991850.0, ans=0.0 2024-08-15 03:46:55,235 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 03:46:58,482 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 03:47:17,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2992050.0, ans=0.125 2024-08-15 03:47:29,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2992050.0, ans=0.0 2024-08-15 03:47:38,823 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 03:47:45,187 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 03:47:46,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2024-08-15 03:47:50,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2992150.0, ans=0.0 2024-08-15 03:48:01,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2992250.0, ans=0.035 2024-08-15 03:48:08,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9400, loss[loss=0.09676, beats_loss=0.01265, ecapa_loss=0.000166, whisper_loss=0.08245, over 22288.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01076, ecapa_loss=0.000153, whisper_loss=0.09001, over 3926196.18 frames. ], batch size: 93, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:48:30,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2992450.0, ans=0.07 2024-08-15 03:48:41,750 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 03:48:42,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-08-15 03:48:43,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2992550.0, ans=0.1 2024-08-15 03:48:55,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2992550.0, ans=0.125 2024-08-15 03:48:57,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2992550.0, ans=0.2 2024-08-15 03:49:01,898 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 03:49:04,812 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 03:49:11,279 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.345e+01 2.543e+01 2.847e+01 7.002e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-15 03:49:20,383 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-15 03:49:25,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2992750.0, ans=0.125 2024-08-15 03:49:26,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2992750.0, ans=0.0 2024-08-15 03:49:32,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9450, loss[loss=0.0789, beats_loss=0.01068, ecapa_loss=0.000166, whisper_loss=0.06656, over 13499.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001527, whisper_loss=0.08993, over 3922351.80 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:49:44,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2992850.0, ans=0.0 2024-08-15 03:49:49,534 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 03:49:52,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2992950.0, ans=0.1 2024-08-15 03:50:02,947 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 03:50:04,448 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 03:50:07,126 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 03:50:16,018 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 03:50:24,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2993150.0, ans=0.125 2024-08-15 03:50:29,911 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 03:50:30,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2993150.0, ans=0.125 2024-08-15 03:50:32,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2993150.0, ans=0.0 2024-08-15 03:50:42,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2993250.0, ans=0.0 2024-08-15 03:50:48,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2993250.0, ans=0.1 2024-08-15 03:50:51,268 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 03:50:58,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9500, loss[loss=0.1001, beats_loss=0.01121, ecapa_loss=0.0001438, whisper_loss=0.08741, over 14134.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01071, ecapa_loss=0.0001534, whisper_loss=0.08937, over 3933568.15 frames. ], batch size: 54, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:51:02,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2993350.0, ans=0.125 2024-08-15 03:51:05,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2993350.0, ans=0.125 2024-08-15 03:51:22,366 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-15 03:51:40,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2993550.0, ans=0.125 2024-08-15 03:51:45,106 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 14 from Vox, 52 fro AS 2024-08-15 03:52:00,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2993650.0, ans=0.125 2024-08-15 03:52:00,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2993650.0, ans=0.125 2024-08-15 03:52:00,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2993650.0, ans=0.05 2024-08-15 03:52:05,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.283e+01 2.545e+01 2.911e+01 4.109e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 03:52:18,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2993750.0, ans=0.0 2024-08-15 03:52:23,148 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 03:52:26,664 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 03:52:28,506 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 03:52:29,581 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9550, loss[loss=0.1235, beats_loss=0.01013, ecapa_loss=0.0001339, whisper_loss=0.112, over 23586.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001523, whisper_loss=0.08963, over 3911753.35 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:52:36,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2993850.0, ans=0.0 2024-08-15 03:52:38,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2993850.0, ans=0.0 2024-08-15 03:52:46,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2993950.0, ans=0.125 2024-08-15 03:52:52,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2993950.0, ans=15.0 2024-08-15 03:53:02,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2993950.0, ans=0.125 2024-08-15 03:53:02,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2993950.0, ans=0.2 2024-08-15 03:53:04,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2993950.0, ans=0.0 2024-08-15 03:53:15,526 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-08-15 03:53:35,103 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 03:53:58,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2994250.0, ans=0.125 2024-08-15 03:54:00,905 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9600, loss[loss=0.09809, beats_loss=0.0107, ecapa_loss=0.0001729, whisper_loss=0.08566, over 16885.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001529, whisper_loss=0.0896, over 3874252.12 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:54:03,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2994350.0, ans=0.07 2024-08-15 03:54:04,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2994350.0, ans=0.125 2024-08-15 03:54:14,760 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2024-08-15 03:54:29,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2994450.0, ans=0.2 2024-08-15 03:54:34,810 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 9 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 03:54:40,005 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 03:54:44,612 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 03:54:45,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2994550.0, ans=0.0 2024-08-15 03:54:46,031 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-15 03:55:13,059 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.340e+01 2.536e+01 2.906e+01 4.631e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 03:55:38,599 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 03:55:41,677 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 03:55:43,072 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9650, loss[loss=0.09215, beats_loss=0.01357, ecapa_loss=0.0001296, whisper_loss=0.07729, over 21584.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001532, whisper_loss=0.0898, over 3860704.23 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:55:58,407 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 03:56:12,099 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 03:56:27,413 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 03:56:35,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2995050.0, ans=0.1 2024-08-15 03:57:22,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2995250.0, ans=0.0 2024-08-15 03:57:29,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9700, loss[loss=0.07922, beats_loss=0.01388, ecapa_loss=0.0001342, whisper_loss=0.064, over 21423.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001539, whisper_loss=0.0898, over 3875938.20 frames. ], batch size: 88, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:57:34,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2995350.0, ans=0.0 2024-08-15 03:57:46,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2995350.0, ans=0.0 2024-08-15 03:57:52,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2995350.0, ans=0.0 2024-08-15 03:57:58,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2995450.0, ans=0.125 2024-08-15 03:58:07,504 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 03:58:09,832 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 03:58:13,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2995450.0, ans=0.1 2024-08-15 03:58:16,147 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:58:17,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2995450.0, ans=0.125 2024-08-15 03:58:31,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=22.5 2024-08-15 03:59:07,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2024-08-15 03:59:10,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.634e+01 2.369e+01 2.652e+01 2.894e+01 3.989e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-15 03:59:24,430 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-15 03:59:40,466 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 03:59:46,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9750, loss[loss=0.09358, beats_loss=0.0103, ecapa_loss=0.0001319, whisper_loss=0.08196, over 15080.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001522, whisper_loss=0.08968, over 3831657.59 frames. ], batch size: 58, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:59:51,245 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 04:00:00,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2995850.0, ans=0.125 2024-08-15 04:00:53,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2996050.0, ans=0.0 2024-08-15 04:00:53,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2996050.0, ans=0.07 2024-08-15 04:01:53,962 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9800, loss[loss=0.07777, beats_loss=0.01324, ecapa_loss=0.0001337, whisper_loss=0.06319, over 23267.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.09008, over 3855912.69 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:02:04,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2996350.0, ans=0.125 2024-08-15 04:02:17,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2996350.0, ans=0.1 2024-08-15 04:02:23,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2996450.0, ans=0.125 2024-08-15 04:02:36,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2996450.0, ans=0.0 2024-08-15 04:02:37,672 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 04:02:44,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2996450.0, ans=0.0 2024-08-15 04:03:07,645 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 04:03:25,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.294e+01 2.579e+01 3.082e+01 3.957e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-15 04:03:32,607 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 04:03:33,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2996750.0, ans=0.1 2024-08-15 04:03:45,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9850, loss[loss=0.1016, beats_loss=0.01084, ecapa_loss=0.0001368, whisper_loss=0.08938, over 22737.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.09035, over 3847649.12 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:04:08,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-15 04:04:13,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2996950.0, ans=0.125 2024-08-15 04:04:44,003 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 04:04:47,508 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 04:04:50,032 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.569e-02 2024-08-15 04:04:55,907 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 18 from LS+wenet, 10 from Vox, 45 fro AS 2024-08-15 04:05:04,880 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 04:05:11,117 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9900, loss[loss=0.1204, beats_loss=0.00996, ecapa_loss=0.0001447, whisper_loss=0.109, over 23075.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001509, whisper_loss=0.09071, over 3881067.50 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:05:18,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2997350.0, ans=0.0 2024-08-15 04:05:24,542 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 04:05:26,028 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 04:05:33,073 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 04:05:57,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2997550.0, ans=0.2 2024-08-15 04:06:02,392 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 12 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 04:06:10,796 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 04:06:12,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.308e+01 2.598e+01 3.035e+01 4.044e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 04:06:16,204 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-15 04:06:34,727 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 9950, loss[loss=0.1088, beats_loss=0.01001, ecapa_loss=0.0002021, whisper_loss=0.0968, over 19020.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001517, whisper_loss=0.09122, over 3869579.24 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:06:55,485 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 04:07:11,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2998050.0, ans=0.1 2024-08-15 04:07:18,309 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 18 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 04:07:26,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2998150.0, ans=0.125 2024-08-15 04:07:57,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2024-08-15 04:08:01,346 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10000, loss[loss=0.1096, beats_loss=0.01054, ecapa_loss=0.000121, whisper_loss=0.09786, over 18294.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01066, ecapa_loss=0.0001526, whisper_loss=0.09137, over 3867223.36 frames. ], batch size: 70, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:08:03,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2998350.0, ans=0.125 2024-08-15 04:08:23,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2998450.0, ans=0.0 2024-08-15 04:08:26,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2998450.0, ans=0.125 2024-08-15 04:08:39,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2998550.0, ans=0.125 2024-08-15 04:08:48,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-15 04:08:54,728 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 04:09:00,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2998650.0, ans=0.125 2024-08-15 04:09:02,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.337e+01 2.587e+01 2.892e+01 1.142e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-15 04:09:20,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2998750.0, ans=0.125 2024-08-15 04:09:23,496 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10050, loss[loss=0.09718, beats_loss=0.0125, ecapa_loss=0.0001147, whisper_loss=0.08354, over 18035.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001528, whisper_loss=0.0914, over 3854990.92 frames. ], batch size: 70, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:09:40,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.47 vs. limit=22.5 2024-08-15 04:09:54,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2024-08-15 04:10:01,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-08-15 04:10:19,701 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 04:10:21,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-08-15 04:10:30,036 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 10 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 04:10:37,118 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 04:10:40,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-08-15 04:10:47,447 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10100, loss[loss=0.0961, beats_loss=0.009697, ecapa_loss=0.0001735, whisper_loss=0.08467, over 17587.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001531, whisper_loss=0.0905, over 3861998.76 frames. ], batch size: 70, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:11:10,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2024-08-15 04:11:16,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-08-15 04:11:17,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2999550.0, ans=0.125 2024-08-15 04:11:23,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2999550.0, ans=0.125 2024-08-15 04:11:44,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.436e+01 2.693e+01 3.002e+01 5.180e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-15 04:11:56,228 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 04:11:59,134 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 04:12:01,475 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2999750.0, ans=0.125 2024-08-15 04:12:02,773 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 04:12:05,226 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10150, loss[loss=0.08867, beats_loss=0.009056, ecapa_loss=0.000128, whisper_loss=0.07834, over 15730.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001545, whisper_loss=0.09061, over 3904372.66 frames. ], batch size: 59, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:12:09,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2999850.0, ans=0.125 2024-08-15 04:12:10,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2999850.0, ans=0.125 2024-08-15 04:12:20,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2999950.0, ans=0.125 2024-08-15 04:13:05,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3000150.0, ans=0.0 2024-08-15 04:13:06,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3000150.0, ans=0.125 2024-08-15 04:13:10,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3000250.0, ans=0.125 2024-08-15 04:13:16,239 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 04:13:25,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10200, loss[loss=0.0903, beats_loss=0.01203, ecapa_loss=0.0001525, whisper_loss=0.07674, over 17513.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001532, whisper_loss=0.09097, over 3885992.63 frames. ], batch size: 73, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:13:55,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=3000450.0, ans=6.0 2024-08-15 04:14:03,420 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-15 04:14:05,386 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 04:14:08,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3000550.0, ans=0.0 2024-08-15 04:14:09,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3000550.0, ans=0.125 2024-08-15 04:14:13,576 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 28 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 04:14:23,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.355e+01 2.535e+01 2.807e+01 5.755e+01, threshold=5.070e+01, percent-clipped=1.0 2024-08-15 04:14:34,459 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 10 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 04:14:43,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10250, loss[loss=0.09899, beats_loss=0.01072, ecapa_loss=0.0001017, whisper_loss=0.08725, over 16543.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001526, whisper_loss=0.09073, over 3865358.71 frames. ], batch size: 60, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:14:43,388 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 04:14:46,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3000850.0, ans=0.2 2024-08-15 04:14:47,451 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 04:14:56,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3000850.0, ans=0.125 2024-08-15 04:15:17,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3001050.0, ans=0.125 2024-08-15 04:15:20,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2024-08-15 04:15:25,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3001050.0, ans=0.125 2024-08-15 04:15:34,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3001150.0, ans=0.0 2024-08-15 04:15:36,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.67 vs. limit=22.5 2024-08-15 04:15:42,851 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-15 04:15:44,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3001250.0, ans=0.0 2024-08-15 04:16:00,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10300, loss[loss=0.1065, beats_loss=0.01175, ecapa_loss=0.0001239, whisper_loss=0.09347, over 22963.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001519, whisper_loss=0.09009, over 3878839.19 frames. ], batch size: 87, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:16:01,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3001350.0, ans=0.1 2024-08-15 04:16:10,512 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 04:16:12,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3001350.0, ans=0.2 2024-08-15 04:16:54,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3001650.0, ans=0.2 2024-08-15 04:16:59,012 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.405e+01 2.691e+01 3.048e+01 4.748e+01, threshold=5.382e+01, percent-clipped=0.0 2024-08-15 04:17:16,833 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 04:17:19,575 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10350, loss[loss=0.07593, beats_loss=0.01133, ecapa_loss=0.0001239, whisper_loss=0.06336, over 20486.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.09068, over 3949572.63 frames. ], batch size: 79, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:17:19,761 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 04:17:20,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3001850.0, ans=0.035 2024-08-15 04:17:27,036 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 04:17:35,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3001950.0, ans=0.125 2024-08-15 04:17:42,760 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 04:17:57,204 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 04:18:21,212 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 04:18:25,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3002250.0, ans=0.125 2024-08-15 04:18:26,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3002250.0, ans=0.125 2024-08-15 04:18:26,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3002250.0, ans=0.0 2024-08-15 04:18:39,408 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 04:18:40,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10400, loss[loss=0.09796, beats_loss=0.0109, ecapa_loss=0.0001434, whisper_loss=0.08562, over 18647.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001507, whisper_loss=0.09045, over 3939402.99 frames. ], batch size: 73, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:18:41,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3002350.0, ans=0.0 2024-08-15 04:18:45,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3002350.0, ans=0.125 2024-08-15 04:19:25,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3002650.0, ans=0.0 2024-08-15 04:19:31,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3002650.0, ans=0.125 2024-08-15 04:19:34,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.336e+01 2.571e+01 2.760e+01 5.271e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-15 04:19:42,250 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 04:19:53,661 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10450, loss[loss=0.1058, beats_loss=0.01105, ecapa_loss=0.0001823, whisper_loss=0.09297, over 21597.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001511, whisper_loss=0.09015, over 3892605.63 frames. ], batch size: 87, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:20:02,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3002850.0, ans=0.0 2024-08-15 04:20:06,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3002950.0, ans=0.1 2024-08-15 04:20:08,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-15 04:20:21,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3003050.0, ans=0.2 2024-08-15 04:20:25,652 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 04:20:29,481 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 04:20:37,725 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 04:20:50,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3003250.0, ans=0.0 2024-08-15 04:21:00,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3003250.0, ans=0.125 2024-08-15 04:21:02,787 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 04:21:04,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10500, loss[loss=0.08543, beats_loss=0.01358, ecapa_loss=0.0001446, whisper_loss=0.0704, over 15136.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001522, whisper_loss=0.09007, over 3908512.25 frames. ], batch size: 62, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:21:04,121 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 04:21:11,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3003350.0, ans=15.0 2024-08-15 04:21:53,299 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 04:21:54,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.267e+01 2.471e+01 2.846e+01 8.765e+01, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 04:21:58,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-15 04:22:12,628 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10550, loss[loss=0.08121, beats_loss=0.01101, ecapa_loss=0.0001257, whisper_loss=0.06895, over 15156.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001518, whisper_loss=0.08971, over 3875954.71 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:22:19,894 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 04:22:21,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-15 04:22:25,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2024-08-15 04:22:27,931 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 04:22:43,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3004050.0, ans=0.125 2024-08-15 04:22:53,294 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:22:59,852 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 04:23:06,563 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 04:23:12,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3004250.0, ans=0.125 2024-08-15 04:23:19,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-08-15 04:23:21,334 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10600, loss[loss=0.1103, beats_loss=0.009278, ecapa_loss=0.0001325, whisper_loss=0.09975, over 22884.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.08898, over 3887368.93 frames. ], batch size: 88, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:23:26,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3004350.0, ans=0.2 2024-08-15 04:23:32,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3004350.0, ans=0.2 2024-08-15 04:24:02,784 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 04:24:07,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3004650.0, ans=0.0 2024-08-15 04:24:12,013 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.388e+01 2.629e+01 3.044e+01 4.366e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-15 04:24:13,589 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 04:24:17,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3004750.0, ans=0.0 2024-08-15 04:24:28,072 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 04:24:30,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10650, loss[loss=0.1332, beats_loss=0.006362, ecapa_loss=0.0001538, whisper_loss=0.1253, over 20104.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001505, whisper_loss=0.08998, over 3881305.99 frames. ], batch size: 74, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:25:01,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-15 04:25:16,861 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 04:25:22,257 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 04:25:29,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3005250.0, ans=0.125 2024-08-15 04:25:43,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10700, loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001277, whisper_loss=0.09276, over 16960.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001505, whisper_loss=0.09004, over 3889911.39 frames. ], batch size: 67, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:25:47,363 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-15 04:25:59,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3005450.0, ans=0.0 2024-08-15 04:26:04,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2024-08-15 04:26:09,169 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.75 vs. limit=10.0 2024-08-15 04:26:23,032 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 04:26:25,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3005550.0, ans=0.0 2024-08-15 04:26:26,273 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 04:26:34,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3005650.0, ans=0.0 2024-08-15 04:26:34,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3005650.0, ans=0.95 2024-08-15 04:26:34,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-08-15 04:26:39,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.322e+01 2.552e+01 2.947e+01 1.324e+02, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 04:26:40,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-15 04:26:54,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3005750.0, ans=0.125 2024-08-15 04:26:59,473 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10750, loss[loss=0.1107, beats_loss=0.0103, ecapa_loss=0.0001721, whisper_loss=0.09867, over 21580.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001505, whisper_loss=0.09093, over 3904863.46 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:27:05,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3005850.0, ans=0.0 2024-08-15 04:27:19,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3005950.0, ans=0.125 2024-08-15 04:27:25,222 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.497e+01 2024-08-15 04:27:31,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3006050.0, ans=0.0 2024-08-15 04:27:33,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3006050.0, ans=0.1 2024-08-15 04:27:41,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3006050.0, ans=0.0 2024-08-15 04:27:46,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3006150.0, ans=0.0 2024-08-15 04:27:46,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3006150.0, ans=0.09899494936611666 2024-08-15 04:27:56,486 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 04:28:07,523 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 04:28:16,873 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10800, loss[loss=0.126, beats_loss=0.00899, ecapa_loss=0.000137, whisper_loss=0.1157, over 23635.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01052, ecapa_loss=0.0001509, whisper_loss=0.09158, over 3917629.52 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:28:36,889 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 04:29:06,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2024-08-15 04:29:12,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.426e+01 2.732e+01 3.113e+01 1.619e+02, threshold=5.464e+01, percent-clipped=2.0 2024-08-15 04:29:12,866 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 25 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 04:29:17,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3006750.0, ans=0.1 2024-08-15 04:29:29,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3006750.0, ans=0.125 2024-08-15 04:29:31,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10850, loss[loss=0.08812, beats_loss=0.01119, ecapa_loss=0.0001243, whisper_loss=0.07569, over 21061.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.09132, over 3891772.35 frames. ], batch size: 84, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:29:31,956 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 04:29:46,077 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.478e-03 2024-08-15 04:29:47,014 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-15 04:29:49,779 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 04:30:05,628 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 04:30:14,170 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 04:30:21,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3007150.0, ans=0.125 2024-08-15 04:30:35,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3007250.0, ans=0.09899494936611666 2024-08-15 04:30:41,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3007250.0, ans=0.1 2024-08-15 04:30:50,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10900, loss[loss=0.08469, beats_loss=0.01365, ecapa_loss=0.0001467, whisper_loss=0.06958, over 15037.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001524, whisper_loss=0.09135, over 3870640.59 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:30:57,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3007350.0, ans=0.125 2024-08-15 04:31:02,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-15 04:31:21,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3007550.0, ans=0.125 2024-08-15 04:31:37,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-15 04:31:38,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3007650.0, ans=0.1 2024-08-15 04:31:46,028 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 31 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-15 04:31:47,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.315e+01 2.550e+01 2.913e+01 4.386e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-15 04:31:49,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3007650.0, ans=0.0 2024-08-15 04:31:51,657 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 04:31:53,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3007750.0, ans=0.125 2024-08-15 04:32:01,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2024-08-15 04:32:06,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 10950, loss[loss=0.1221, beats_loss=0.008657, ecapa_loss=0.0001903, whisper_loss=0.1116, over 21401.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01053, ecapa_loss=0.0001525, whisper_loss=0.0916, over 3894371.74 frames. ], batch size: 86, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:32:24,041 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 04:32:35,607 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 04:32:51,827 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 04:33:00,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3008150.0, ans=0.1 2024-08-15 04:33:02,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3008150.0, ans=0.2 2024-08-15 04:33:03,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3008150.0, ans=0.1 2024-08-15 04:33:22,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11000, loss[loss=0.09504, beats_loss=0.0109, ecapa_loss=0.0002036, whisper_loss=0.08211, over 19011.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001539, whisper_loss=0.09129, over 3897592.06 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:33:24,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3008350.0, ans=0.2 2024-08-15 04:33:24,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3008350.0, ans=0.0 2024-08-15 04:33:36,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3008450.0, ans=0.125 2024-08-15 04:33:38,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3008450.0, ans=0.125 2024-08-15 04:33:38,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3008450.0, ans=0.0 2024-08-15 04:33:40,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3008450.0, ans=0.07 2024-08-15 04:33:52,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3008550.0, ans=0.2 2024-08-15 04:34:00,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3008550.0, ans=0.125 2024-08-15 04:34:05,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.61 vs. limit=15.0 2024-08-15 04:34:10,493 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 04:34:14,935 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-15 04:34:16,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3008650.0, ans=0.0 2024-08-15 04:34:20,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.435e+01 2.579e+01 2.993e+01 2.045e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-15 04:34:22,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2024-08-15 04:34:25,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2024-08-15 04:34:26,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3008750.0, ans=0.125 2024-08-15 04:34:37,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11050, loss[loss=0.09853, beats_loss=0.01079, ecapa_loss=0.0001416, whisper_loss=0.08632, over 18501.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001539, whisper_loss=0.09167, over 3919749.64 frames. ], batch size: 73, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:34:40,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3008850.0, ans=0.2 2024-08-15 04:34:40,630 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:34:52,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3008950.0, ans=0.0 2024-08-15 04:35:07,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3009050.0, ans=0.0 2024-08-15 04:35:15,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3009050.0, ans=0.125 2024-08-15 04:35:15,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3009050.0, ans=0.2 2024-08-15 04:35:32,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3009150.0, ans=0.0 2024-08-15 04:35:52,402 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11100, loss[loss=0.1025, beats_loss=0.01118, ecapa_loss=0.0001069, whisper_loss=0.09025, over 17341.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001527, whisper_loss=0.09065, over 3905665.67 frames. ], batch size: 63, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:35:54,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2024-08-15 04:36:07,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3009450.0, ans=0.125 2024-08-15 04:36:08,646 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 04:36:11,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3009450.0, ans=0.2 2024-08-15 04:36:15,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3009450.0, ans=0.0 2024-08-15 04:36:48,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.388e+01 2.670e+01 2.959e+01 6.163e+01, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 04:36:57,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3009750.0, ans=0.125 2024-08-15 04:37:07,411 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11150, loss[loss=0.1091, beats_loss=0.01084, ecapa_loss=0.0001387, whisper_loss=0.09684, over 18720.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01045, ecapa_loss=0.0001528, whisper_loss=0.0915, over 3942101.96 frames. ], batch size: 71, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:37:08,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-15 04:37:35,354 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 04:37:40,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-15 04:37:45,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3010050.0, ans=0.125 2024-08-15 04:37:47,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3010050.0, ans=0.2 2024-08-15 04:37:48,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3010050.0, ans=0.125 2024-08-15 04:37:51,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3010150.0, ans=0.0 2024-08-15 04:37:56,901 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 04:37:59,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3010150.0, ans=0.0 2024-08-15 04:38:04,112 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 04:38:19,749 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11200, loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001442, whisper_loss=0.09085, over 16751.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001538, whisper_loss=0.09101, over 3905940.79 frames. ], batch size: 68, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:38:30,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=12.0 2024-08-15 04:38:31,590 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 04:39:09,812 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 04:39:15,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.332e+01 2.561e+01 2.829e+01 4.358e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-15 04:39:31,271 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 04:39:33,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11250, loss[loss=0.109, beats_loss=0.01086, ecapa_loss=0.0001506, whisper_loss=0.09665, over 19261.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.000153, whisper_loss=0.09134, over 3894596.91 frames. ], batch size: 79, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:39:34,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3010850.0, ans=0.125 2024-08-15 04:40:01,182 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 04:40:13,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3011050.0, ans=0.1 2024-08-15 04:40:20,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3011150.0, ans=0.0 2024-08-15 04:40:40,916 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 04:40:50,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11300, loss[loss=0.09835, beats_loss=0.01057, ecapa_loss=0.0001262, whisper_loss=0.08652, over 15582.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01044, ecapa_loss=0.000152, whisper_loss=0.0917, over 3873056.97 frames. ], batch size: 60, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:41:26,785 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-08-15 04:41:28,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3011550.0, ans=0.0 2024-08-15 04:41:43,476 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 04:41:48,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3011650.0, ans=0.125 2024-08-15 04:41:52,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3011650.0, ans=0.1 2024-08-15 04:41:52,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.562e+01 2.942e+01 5.561e+01, threshold=5.125e+01, percent-clipped=1.0 2024-08-15 04:42:05,943 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-15 04:42:10,045 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11350, loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001432, whisper_loss=0.0921, over 21121.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001522, whisper_loss=0.09146, over 3853485.19 frames. ], batch size: 85, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:42:25,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3011950.0, ans=0.0 2024-08-15 04:42:42,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3012050.0, ans=0.0 2024-08-15 04:42:52,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3012050.0, ans=0.95 2024-08-15 04:42:56,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3012150.0, ans=0.125 2024-08-15 04:42:59,166 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 04:42:59,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3012150.0, ans=0.2 2024-08-15 04:43:00,561 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-15 04:43:04,676 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 04:43:08,940 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-15 04:43:12,087 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-15 04:43:12,942 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 04:43:25,259 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11400, loss[loss=0.09789, beats_loss=0.01027, ecapa_loss=0.0001365, whisper_loss=0.08626, over 16947.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.000152, whisper_loss=0.09173, over 3851256.47 frames. ], batch size: 68, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:43:26,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3012350.0, ans=0.1 2024-08-15 04:43:42,230 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 36 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 04:43:46,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3012450.0, ans=0.0 2024-08-15 04:44:19,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3012650.0, ans=0.0 2024-08-15 04:44:21,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3012650.0, ans=0.125 2024-08-15 04:44:22,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.412e+01 2.712e+01 2.971e+01 3.918e+01, threshold=5.424e+01, percent-clipped=0.0 2024-08-15 04:44:32,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-15 04:44:34,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3012750.0, ans=0.1 2024-08-15 04:44:39,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11450, loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001506, whisper_loss=0.0935, over 23162.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001514, whisper_loss=0.09145, over 3873072.90 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:44:53,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3012950.0, ans=0.1 2024-08-15 04:44:54,368 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 04:45:03,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3012950.0, ans=0.0 2024-08-15 04:45:07,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3012950.0, ans=0.2 2024-08-15 04:45:12,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3013050.0, ans=0.125 2024-08-15 04:45:31,362 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:45:32,727 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 04:45:43,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3013250.0, ans=0.125 2024-08-15 04:45:46,817 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 04:45:49,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3013250.0, ans=0.125 2024-08-15 04:45:50,616 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.409e-01 2024-08-15 04:45:55,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11500, loss[loss=0.1096, beats_loss=0.009116, ecapa_loss=0.0001845, whisper_loss=0.09866, over 21988.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001514, whisper_loss=0.09201, over 3926774.92 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:46:14,577 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 04:46:20,974 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 04:46:24,649 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 04:46:31,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3013550.0, ans=0.125 2024-08-15 04:46:37,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3013650.0, ans=0.0 2024-08-15 04:46:38,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3013650.0, ans=0.125 2024-08-15 04:46:39,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3013650.0, ans=0.125 2024-08-15 04:46:47,942 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 04:46:52,430 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.370e+01 2.550e+01 2.848e+01 7.027e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-15 04:46:54,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3013750.0, ans=0.125 2024-08-15 04:47:06,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3013750.0, ans=0.125 2024-08-15 04:47:08,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11550, loss[loss=0.109, beats_loss=0.008288, ecapa_loss=0.0001759, whisper_loss=0.09897, over 14519.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001496, whisper_loss=0.09129, over 3889360.74 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:47:20,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-15 04:47:37,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3013950.0, ans=0.125 2024-08-15 04:47:41,285 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-15 04:47:56,110 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-15 04:48:09,833 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:48:27,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3014250.0, ans=0.125 2024-08-15 04:48:29,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11600, loss[loss=0.1125, beats_loss=0.008032, ecapa_loss=0.0001626, whisper_loss=0.1028, over 21494.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001519, whisper_loss=0.09164, over 3923296.08 frames. ], batch size: 83, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:48:37,944 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 04:48:38,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2024-08-15 04:48:48,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3014450.0, ans=0.0 2024-08-15 04:48:53,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3014450.0, ans=0.0 2024-08-15 04:48:59,368 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 04:49:18,904 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 04:49:29,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3014650.0, ans=0.125 2024-08-15 04:49:32,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.366e+01 2.590e+01 2.931e+01 3.199e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-15 04:49:35,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-15 04:49:45,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-15 04:49:49,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11650, loss[loss=0.1129, beats_loss=0.01108, ecapa_loss=0.0001393, whisper_loss=0.1004, over 23136.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001521, whisper_loss=0.09172, over 3933220.32 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:49:55,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3014850.0, ans=0.0 2024-08-15 04:50:15,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3014950.0, ans=0.0 2024-08-15 04:50:20,354 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 24 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-15 04:50:20,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3015050.0, ans=0.125 2024-08-15 04:50:33,072 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=15.0 2024-08-15 04:50:39,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3015150.0, ans=0.0 2024-08-15 04:50:44,431 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 04:50:55,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-08-15 04:50:56,687 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 04:50:56,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3015250.0, ans=0.125 2024-08-15 04:51:06,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11700, loss[loss=0.1058, beats_loss=0.01038, ecapa_loss=0.0001499, whisper_loss=0.0939, over 21760.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01064, ecapa_loss=0.0001522, whisper_loss=0.09155, over 3962070.25 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:51:08,360 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 04:51:23,475 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 04:51:51,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3015550.0, ans=0.125 2024-08-15 04:51:53,918 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 04:52:02,480 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-15 04:52:04,956 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-15 04:52:05,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3015650.0, ans=0.0 2024-08-15 04:52:08,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.391e+01 2.584e+01 2.894e+01 1.234e+02, threshold=5.167e+01, percent-clipped=2.0 2024-08-15 04:52:16,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-15 04:52:26,648 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11750, loss[loss=0.09581, beats_loss=0.01189, ecapa_loss=0.0001273, whisper_loss=0.08265, over 17374.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001526, whisper_loss=0.09113, over 3979145.41 frames. ], batch size: 69, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:52:32,436 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.262e+05 2024-08-15 04:52:40,217 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 04:52:52,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3015950.0, ans=0.2 2024-08-15 04:52:57,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3016050.0, ans=0.2 2024-08-15 04:53:00,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3016050.0, ans=0.0 2024-08-15 04:53:06,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3016050.0, ans=0.07 2024-08-15 04:53:19,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3016150.0, ans=0.125 2024-08-15 04:53:31,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3016250.0, ans=0.0 2024-08-15 04:53:47,794 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11800, loss[loss=0.1104, beats_loss=0.007951, ecapa_loss=0.0001709, whisper_loss=0.1007, over 15514.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001515, whisper_loss=0.0912, over 3951722.03 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:53:58,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3016350.0, ans=0.0 2024-08-15 04:54:10,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3016450.0, ans=0.5 2024-08-15 04:54:32,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3016650.0, ans=0.0 2024-08-15 04:54:47,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.361e+01 2.696e+01 3.037e+01 7.582e+01, threshold=5.392e+01, percent-clipped=2.0 2024-08-15 04:54:48,341 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.474e-02 2024-08-15 04:54:48,379 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.032e+01 2024-08-15 04:54:49,365 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 21 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-15 04:54:58,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.11 vs. limit=22.5 2024-08-15 04:55:00,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3016750.0, ans=0.0 2024-08-15 04:55:05,076 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11850, loss[loss=0.09256, beats_loss=0.01357, ecapa_loss=0.0001452, whisper_loss=0.07754, over 21218.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001511, whisper_loss=0.09074, over 3929017.35 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:55:07,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3016850.0, ans=0.0 2024-08-15 04:55:12,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3016850.0, ans=0.2 2024-08-15 04:55:21,546 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 04:55:35,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3017050.0, ans=0.0 2024-08-15 04:55:35,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3017050.0, ans=0.125 2024-08-15 04:56:03,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3017150.0, ans=0.125 2024-08-15 04:56:10,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3017250.0, ans=0.125 2024-08-15 04:56:11,328 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 04:56:12,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-15 04:56:20,989 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11900, loss[loss=0.1073, beats_loss=0.01028, ecapa_loss=0.0001634, whisper_loss=0.09539, over 21961.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01082, ecapa_loss=0.000151, whisper_loss=0.09049, over 3924689.81 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:56:26,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-15 04:56:33,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-08-15 04:56:42,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=25.69 vs. limit=22.5 2024-08-15 04:56:44,502 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 04:56:47,499 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 04:57:07,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3017650.0, ans=0.0 2024-08-15 04:57:14,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3017650.0, ans=0.2 2024-08-15 04:57:18,680 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 04:57:20,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.277e+01 2.486e+01 2.850e+01 3.770e+01, threshold=4.972e+01, percent-clipped=0.0 2024-08-15 04:57:22,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3017750.0, ans=0.0 2024-08-15 04:57:26,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3017750.0, ans=0.125 2024-08-15 04:57:35,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 11950, loss[loss=0.1085, beats_loss=0.0106, ecapa_loss=0.0001124, whisper_loss=0.09683, over 16686.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001517, whisper_loss=0.09055, over 3899557.90 frames. ], batch size: 64, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:57:41,936 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 44 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 04:58:05,538 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 04:58:09,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3018050.0, ans=0.0 2024-08-15 04:58:15,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-15 04:58:27,563 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 04:58:48,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12000, loss[loss=0.0903, beats_loss=0.01424, ecapa_loss=0.0001113, whisper_loss=0.07495, over 21897.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001516, whisper_loss=0.09076, over 3885118.85 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:58:48,133 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 04:59:32,685 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005394, whisper_loss=0.2473, over 922467.00 frames. 2024-08-15 04:59:53,116 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on SV_voxceleb1: loss=0.004335, beats_loss=0, ecapa_loss=0.0004335, whisper_loss=0, over 939242.00 frames. 2024-08-15 05:01:54,726 INFO [train_multi_KD3.py:1149] (2/4) Epoch 21, validation on AT_audioset: loss=0.02336, beats_loss=0.02336, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 05:01:54,730 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 05:01:55,548 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2024-08-15 05:01:59,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3018350.0, ans=0.09899494936611666 2024-08-15 05:02:14,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-15 05:02:15,511 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 29 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 05:02:24,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-15 05:02:33,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3018550.0, ans=0.2 2024-08-15 05:02:37,755 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 05:02:50,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.329e+01 2.556e+01 2.882e+01 4.155e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 05:02:52,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3018750.0, ans=0.125 2024-08-15 05:02:56,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3018750.0, ans=0.125 2024-08-15 05:03:04,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3018850.0, ans=0.125 2024-08-15 05:03:04,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12050, loss[loss=0.08989, beats_loss=0.01253, ecapa_loss=0.0001819, whisper_loss=0.07554, over 21083.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001525, whisper_loss=0.09135, over 3887223.09 frames. ], batch size: 91, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:03:09,795 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 43 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-15 05:03:11,009 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 05:03:15,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2024-08-15 05:03:26,182 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 05:03:32,775 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 05:03:58,228 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 05:04:04,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3019250.0, ans=0.1 2024-08-15 05:04:05,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3019250.0, ans=0.125 2024-08-15 05:04:13,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12100, loss[loss=0.1101, beats_loss=0.009154, ecapa_loss=0.0001826, whisper_loss=0.09912, over 20929.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.09117, over 3890557.46 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:04:24,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3019350.0, ans=0.0 2024-08-15 05:04:38,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-08-15 05:04:59,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-15 05:05:07,568 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:05:08,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3019650.0, ans=0.1 2024-08-15 05:05:08,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3019650.0, ans=0.5 2024-08-15 05:05:09,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.350e+01 2.548e+01 2.785e+01 3.671e+01, threshold=5.096e+01, percent-clipped=0.0 2024-08-15 05:05:19,934 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 05:05:26,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12150, loss[loss=0.1172, beats_loss=0.009852, ecapa_loss=0.0001266, whisper_loss=0.1061, over 23003.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001513, whisper_loss=0.09158, over 3906138.00 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:05:40,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=12.0 2024-08-15 05:05:46,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-15 05:05:54,671 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 05:05:56,597 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 05:05:58,652 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 05:06:42,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3020250.0, ans=0.125 2024-08-15 05:06:46,618 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12200, loss[loss=0.1077, beats_loss=0.009875, ecapa_loss=0.0001569, whisper_loss=0.09622, over 22342.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01052, ecapa_loss=0.0001515, whisper_loss=0.0919, over 3889098.72 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:06:49,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3020350.0, ans=0.1 2024-08-15 05:06:55,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3020350.0, ans=0.0 2024-08-15 05:07:31,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3020650.0, ans=0.2 2024-08-15 05:07:45,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.310e+01 2.623e+01 3.026e+01 6.571e+01, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 05:07:55,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3020750.0, ans=0.125 2024-08-15 05:08:03,543 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12250, loss[loss=0.09796, beats_loss=0.01233, ecapa_loss=0.0001646, whisper_loss=0.08399, over 21924.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01052, ecapa_loss=0.0001528, whisper_loss=0.09172, over 3882373.91 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:08:07,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3020850.0, ans=0.05 2024-08-15 05:08:29,507 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 05:08:31,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3020950.0, ans=0.0 2024-08-15 05:08:47,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3021050.0, ans=0.125 2024-08-15 05:08:49,729 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 05:09:20,928 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12300, loss[loss=0.08807, beats_loss=0.01035, ecapa_loss=0.0001743, whisper_loss=0.07598, over 14086.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.000153, whisper_loss=0.09149, over 3882483.37 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:09:24,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3021350.0, ans=0.125 2024-08-15 05:09:40,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3021450.0, ans=0.125 2024-08-15 05:09:43,130 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 05:09:48,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3021450.0, ans=0.1 2024-08-15 05:09:50,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3021550.0, ans=0.0 2024-08-15 05:10:15,851 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 05:10:22,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.390e+01 2.646e+01 2.944e+01 2.237e+02, threshold=5.293e+01, percent-clipped=1.0 2024-08-15 05:10:24,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3021750.0, ans=0.0 2024-08-15 05:10:27,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3021750.0, ans=0.0 2024-08-15 05:10:37,091 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 05:10:38,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12350, loss[loss=0.1014, beats_loss=0.0107, ecapa_loss=0.0001583, whisper_loss=0.08911, over 14695.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001529, whisper_loss=0.09156, over 3876904.30 frames. ], batch size: 60, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:10:40,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3021850.0, ans=0.1 2024-08-15 05:10:58,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-08-15 05:11:18,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3022050.0, ans=0.125 2024-08-15 05:11:20,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-15 05:11:27,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3022150.0, ans=0.125 2024-08-15 05:11:30,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2024-08-15 05:11:34,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=12.0 2024-08-15 05:11:40,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3022250.0, ans=0.1 2024-08-15 05:11:50,369 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12400, loss[loss=0.09834, beats_loss=0.01217, ecapa_loss=0.000127, whisper_loss=0.08489, over 22359.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001517, whisper_loss=0.09138, over 3888544.83 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:11:59,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3022350.0, ans=0.0 2024-08-15 05:12:11,145 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 37 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 05:12:15,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3022450.0, ans=10.0 2024-08-15 05:12:32,489 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-15 05:12:42,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.329e+01 2.587e+01 2.851e+01 3.829e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 05:12:51,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3022750.0, ans=0.2 2024-08-15 05:12:55,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3022750.0, ans=0.0 2024-08-15 05:12:58,013 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12450, loss[loss=0.07464, beats_loss=0.01173, ecapa_loss=0.0001734, whisper_loss=0.06118, over 17535.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001518, whisper_loss=0.09127, over 3888692.32 frames. ], batch size: 75, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:12:58,240 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 05:13:02,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3022850.0, ans=0.125 2024-08-15 05:13:06,535 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:13:09,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2024-08-15 05:13:27,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.07 vs. limit=15.0 2024-08-15 05:13:29,140 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 05:13:40,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3023150.0, ans=0.0 2024-08-15 05:13:52,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3023250.0, ans=0.125 2024-08-15 05:14:04,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12500, loss[loss=0.1245, beats_loss=0.00953, ecapa_loss=0.0001388, whisper_loss=0.1136, over 15194.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.0001521, whisper_loss=0.09147, over 3899222.44 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:14:19,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.91 vs. limit=10.0 2024-08-15 05:14:23,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3023450.0, ans=0.1 2024-08-15 05:14:45,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3023650.0, ans=0.0 2024-08-15 05:14:47,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3023650.0, ans=0.125 2024-08-15 05:14:52,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3023650.0, ans=0.125 2024-08-15 05:14:57,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2024-08-15 05:14:57,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.569e+01 2.941e+01 3.163e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-15 05:15:12,654 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12550, loss[loss=0.1085, beats_loss=0.01066, ecapa_loss=0.0001392, whisper_loss=0.09641, over 22994.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001521, whisper_loss=0.09134, over 3903918.88 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:15:36,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3023950.0, ans=0.125 2024-08-15 05:15:38,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3024050.0, ans=0.0 2024-08-15 05:15:39,785 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 38 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 05:15:50,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3024050.0, ans=0.0 2024-08-15 05:16:02,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3024150.0, ans=0.0 2024-08-15 05:16:07,814 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 05:16:09,088 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 48 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 05:16:09,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3024250.0, ans=0.125 2024-08-15 05:16:14,480 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 05:16:14,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3024250.0, ans=0.2 2024-08-15 05:16:17,556 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-15 05:16:17,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3024250.0, ans=0.125 2024-08-15 05:16:20,263 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12600, loss[loss=0.1128, beats_loss=0.008996, ecapa_loss=0.0001429, whisper_loss=0.1024, over 17509.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01055, ecapa_loss=0.0001519, whisper_loss=0.09175, over 3923860.19 frames. ], batch size: 68, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:16:23,047 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 05:16:29,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3024350.0, ans=0.0 2024-08-15 05:16:31,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=12.0 2024-08-15 05:16:37,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2024-08-15 05:16:51,359 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 05:16:54,007 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 05:17:13,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.257e+01 2.680e+01 2.970e+01 2.910e+02, threshold=5.361e+01, percent-clipped=1.0 2024-08-15 05:17:27,279 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12650, loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001473, whisper_loss=0.09232, over 22388.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001517, whisper_loss=0.09118, over 3907152.87 frames. ], batch size: 91, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:17:40,869 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 05:17:49,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3024950.0, ans=0.0 2024-08-15 05:17:52,990 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-15 05:18:20,211 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 05:18:33,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12700, loss[loss=0.1229, beats_loss=0.01047, ecapa_loss=0.0001374, whisper_loss=0.1111, over 24070.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001524, whisper_loss=0.09163, over 3924506.70 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:18:41,863 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 05:19:07,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3025550.0, ans=0.125 2024-08-15 05:19:18,990 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 05:19:26,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.352e+01 2.609e+01 2.982e+01 1.854e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-15 05:19:32,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3025750.0, ans=0.0 2024-08-15 05:19:39,864 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12750, loss[loss=0.1149, beats_loss=0.01034, ecapa_loss=0.0001503, whisper_loss=0.103, over 22182.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001529, whisper_loss=0.09135, over 3916872.83 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:19:40,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3025850.0, ans=0.0 2024-08-15 05:19:43,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3025850.0, ans=0.0 2024-08-15 05:19:44,141 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 05:19:45,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-15 05:19:54,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3025950.0, ans=0.0 2024-08-15 05:19:55,150 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=22.5 2024-08-15 05:19:55,549 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 05:20:24,894 WARNING [optim.py:496] (2/4) Scaling gradients by 0.023750245571136475, model_norm_threshold=52.18341064453125 2024-08-15 05:20:25,059 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.496e+05, grad_sumsq=7.496e+05, orig_rms_sq=1.000e+00 2024-08-15 05:20:29,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3026150.0, ans=0.1 2024-08-15 05:20:29,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3026150.0, ans=0.1 2024-08-15 05:20:45,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12800, loss[loss=0.1378, beats_loss=0.008309, ecapa_loss=0.0001432, whisper_loss=0.1281, over 24672.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.09111, over 3912542.06 frames. ], batch size: 91, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:21:03,352 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-15 05:21:06,002 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 13 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 05:21:27,460 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 05:21:35,582 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.289e+01 2024-08-15 05:21:39,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.369e+01 2.658e+01 2.978e+01 2.197e+03, threshold=5.317e+01, percent-clipped=3.0 2024-08-15 05:21:44,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-15 05:21:44,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2024-08-15 05:21:51,590 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 05:21:52,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12850, loss[loss=0.0945, beats_loss=0.01232, ecapa_loss=0.0001426, whisper_loss=0.08075, over 23279.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01082, ecapa_loss=0.0001518, whisper_loss=0.09, over 3928528.28 frames. ], batch size: 96, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:22:20,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3027050.0, ans=0.1 2024-08-15 05:22:31,823 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 05:22:35,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3027150.0, ans=0.125 2024-08-15 05:22:36,916 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 28 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-15 05:22:52,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-15 05:22:53,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3027250.0, ans=0.0 2024-08-15 05:22:57,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3027250.0, ans=0.2 2024-08-15 05:22:59,388 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12900, loss[loss=0.1031, beats_loss=0.01334, ecapa_loss=0.0001227, whisper_loss=0.08851, over 19381.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001527, whisper_loss=0.0897, over 3893803.52 frames. ], batch size: 76, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:23:27,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3027550.0, ans=0.0 2024-08-15 05:23:28,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3027550.0, ans=0.1 2024-08-15 05:23:41,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3027650.0, ans=0.2 2024-08-15 05:23:44,098 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 05:23:47,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3027650.0, ans=0.125 2024-08-15 05:23:53,728 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.303e+01 2.501e+01 2.765e+01 4.358e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-15 05:23:55,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3027750.0, ans=0.125 2024-08-15 05:24:06,535 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 12950, loss[loss=0.09302, beats_loss=0.009057, ecapa_loss=0.0001638, whisper_loss=0.08232, over 16579.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01071, ecapa_loss=0.0001514, whisper_loss=0.08976, over 3866622.97 frames. ], batch size: 67, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:24:14,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3027850.0, ans=0.125 2024-08-15 05:24:29,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3027950.0, ans=0.125 2024-08-15 05:24:41,910 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-15 05:25:05,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3028250.0, ans=0.0 2024-08-15 05:25:13,685 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13000, loss[loss=0.1191, beats_loss=0.01231, ecapa_loss=0.0001573, whisper_loss=0.1052, over 22792.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.000152, whisper_loss=0.09051, over 3879525.32 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:25:17,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2024-08-15 05:25:37,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3028450.0, ans=0.0 2024-08-15 05:26:00,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2024-08-15 05:26:05,071 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:26:07,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.449e+01 2.686e+01 3.098e+01 1.940e+02, threshold=5.373e+01, percent-clipped=2.0 2024-08-15 05:26:09,016 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-15 05:26:09,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3028750.0, ans=0.125 2024-08-15 05:26:13,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3028750.0, ans=0.1 2024-08-15 05:26:21,092 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13050, loss[loss=0.078, beats_loss=0.01098, ecapa_loss=0.0001173, whisper_loss=0.06585, over 14667.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001508, whisper_loss=0.08965, over 3853247.87 frames. ], batch size: 56, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:26:22,872 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 05:26:28,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3028850.0, ans=0.0 2024-08-15 05:26:28,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3028850.0, ans=0.0 2024-08-15 05:26:40,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2024-08-15 05:26:41,438 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 05:26:57,603 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 05:27:00,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3029050.0, ans=0.125 2024-08-15 05:27:33,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13100, loss[loss=0.1118, beats_loss=0.01248, ecapa_loss=0.0001763, whisper_loss=0.09757, over 18290.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.000151, whisper_loss=0.09021, over 3857589.04 frames. ], batch size: 74, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:27:39,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3029350.0, ans=0.125 2024-08-15 05:27:50,604 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 05:27:59,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-15 05:28:02,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3029450.0, ans=0.1 2024-08-15 05:28:10,521 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 05:28:36,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.367e+01 2.624e+01 3.031e+01 1.630e+02, threshold=5.247e+01, percent-clipped=4.0 2024-08-15 05:28:38,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3029750.0, ans=0.0 2024-08-15 05:28:50,995 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13150, loss[loss=0.09023, beats_loss=0.01086, ecapa_loss=0.0001955, whisper_loss=0.07741, over 16947.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001521, whisper_loss=0.0904, over 3881671.59 frames. ], batch size: 72, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:29:09,629 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 05:29:12,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2024-08-15 05:29:33,841 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 05:29:34,662 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-08-15 05:29:52,222 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 05:29:53,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3030250.0, ans=0.09899494936611666 2024-08-15 05:29:58,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3030250.0, ans=0.025 2024-08-15 05:30:00,846 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 05:30:05,191 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 05:30:09,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13200, loss[loss=0.1216, beats_loss=0.007879, ecapa_loss=0.0001998, whisper_loss=0.1117, over 21514.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001512, whisper_loss=0.09048, over 3849714.11 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:30:11,840 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.863e-01 2024-08-15 05:30:14,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3030350.0, ans=0.125 2024-08-15 05:30:20,531 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 05:30:25,335 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 05:30:25,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3030450.0, ans=0.125 2024-08-15 05:30:25,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-15 05:30:32,374 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 16 from LS+wenet, 35 from Vox, 44 fro AS 2024-08-15 05:31:11,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.273e+01 2.515e+01 2.855e+01 4.648e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-15 05:31:19,242 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 05:31:25,534 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 05:31:26,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13250, loss[loss=0.1088, beats_loss=0.007927, ecapa_loss=0.0001503, whisper_loss=0.09936, over 18942.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001525, whisper_loss=0.0905, over 3892021.11 frames. ], batch size: 71, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:31:29,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3030850.0, ans=0.1 2024-08-15 05:31:31,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3030850.0, ans=0.025 2024-08-15 05:31:36,838 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 05:31:56,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=12.0 2024-08-15 05:32:26,897 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 05:32:33,441 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 05:32:37,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3031250.0, ans=0.125 2024-08-15 05:32:41,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13300, loss[loss=0.09414, beats_loss=0.01009, ecapa_loss=0.0001728, whisper_loss=0.08233, over 21583.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.000152, whisper_loss=0.08992, over 3890586.48 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:32:41,920 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 05:32:44,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3031350.0, ans=0.125 2024-08-15 05:33:15,212 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 05:33:24,047 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 05:33:28,498 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 11 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 05:33:31,317 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 05:33:35,695 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 05:33:41,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.389e+01 2.602e+01 2.951e+01 3.808e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-15 05:33:55,375 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13350, loss[loss=0.1148, beats_loss=0.01106, ecapa_loss=0.0001363, whisper_loss=0.1024, over 22440.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001509, whisper_loss=0.08966, over 3912515.06 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:33:57,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3031850.0, ans=0.1 2024-08-15 05:34:20,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2024-08-15 05:34:21,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3031950.0, ans=0.04949747468305833 2024-08-15 05:34:25,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3032050.0, ans=0.125 2024-08-15 05:34:44,967 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 05:34:53,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3032250.0, ans=22.5 2024-08-15 05:34:54,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3032250.0, ans=0.125 2024-08-15 05:34:56,492 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 05:35:06,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13400, loss[loss=0.09145, beats_loss=0.01206, ecapa_loss=0.0001572, whisper_loss=0.07782, over 20142.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001509, whisper_loss=0.09024, over 3914390.40 frames. ], batch size: 82, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:35:27,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3032450.0, ans=0.2 2024-08-15 05:35:42,551 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 05:36:02,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.320e+01 2.582e+01 2.828e+01 6.062e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 05:36:05,832 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 05:36:18,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13450, loss[loss=0.08238, beats_loss=0.009891, ecapa_loss=0.0001282, whisper_loss=0.0712, over 17974.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001526, whisper_loss=0.09089, over 3920756.99 frames. ], batch size: 71, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:36:25,473 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-15 05:36:31,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3032950.0, ans=0.125 2024-08-15 05:36:40,989 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-15 05:36:56,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3033050.0, ans=0.2 2024-08-15 05:37:30,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13500, loss[loss=0.08692, beats_loss=0.01138, ecapa_loss=0.0001418, whisper_loss=0.07412, over 21899.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001537, whisper_loss=0.09054, over 3889448.39 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:37:30,601 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 05:37:51,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3033450.0, ans=0.1 2024-08-15 05:37:54,581 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-15 05:38:10,011 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 05:38:12,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2024-08-15 05:38:19,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3033650.0, ans=0.1 2024-08-15 05:38:23,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2024-08-15 05:38:26,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.315e+01 2.561e+01 2.861e+01 3.892e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-15 05:38:35,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3033750.0, ans=0.125 2024-08-15 05:38:37,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3033750.0, ans=0.125 2024-08-15 05:38:41,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13550, loss[loss=0.1104, beats_loss=0.01096, ecapa_loss=0.0001407, whisper_loss=0.09805, over 21599.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.000153, whisper_loss=0.08994, over 3870945.10 frames. ], batch size: 84, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:38:56,729 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 05:39:04,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3033950.0, ans=0.0 2024-08-15 05:39:05,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3033950.0, ans=0.2 2024-08-15 05:39:08,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3033950.0, ans=0.125 2024-08-15 05:39:08,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3033950.0, ans=0.07 2024-08-15 05:39:13,793 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 05:39:17,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3034050.0, ans=0.0 2024-08-15 05:39:53,120 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13600, loss[loss=0.1064, beats_loss=0.01192, ecapa_loss=0.0001451, whisper_loss=0.09299, over 22461.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001541, whisper_loss=0.09095, over 3911284.65 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:39:57,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3034350.0, ans=0.0 2024-08-15 05:40:13,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3034450.0, ans=0.125 2024-08-15 05:40:28,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3034550.0, ans=0.1 2024-08-15 05:40:29,127 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 05:40:32,867 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 05:40:33,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3034550.0, ans=0.0 2024-08-15 05:40:53,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.277e+01 2.545e+01 2.819e+01 3.866e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 05:40:59,604 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 05:41:08,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13650, loss[loss=0.1066, beats_loss=0.01067, ecapa_loss=0.0001598, whisper_loss=0.09438, over 23197.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001538, whisper_loss=0.09046, over 3888940.01 frames. ], batch size: 95, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:41:18,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=12.0 2024-08-15 05:41:46,258 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 05:42:11,002 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 05:42:11,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2024-08-15 05:42:14,438 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2024-08-15 05:42:22,458 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13700, loss[loss=0.08045, beats_loss=0.01143, ecapa_loss=0.0001828, whisper_loss=0.0672, over 12405.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01071, ecapa_loss=0.000153, whisper_loss=0.08976, over 3887978.63 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:42:23,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3035350.0, ans=0.1 2024-08-15 05:42:27,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3035350.0, ans=0.125 2024-08-15 05:42:57,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3035550.0, ans=0.125 2024-08-15 05:43:11,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3035650.0, ans=0.1 2024-08-15 05:43:13,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3035650.0, ans=0.1 2024-08-15 05:43:18,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3035650.0, ans=0.2 2024-08-15 05:43:21,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3035650.0, ans=0.2 2024-08-15 05:43:21,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3035650.0, ans=0.0 2024-08-15 05:43:21,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3035650.0, ans=0.1 2024-08-15 05:43:23,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.259e+01 2.458e+01 2.754e+01 9.155e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-15 05:43:34,702 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 05:43:38,650 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13750, loss[loss=0.08012, beats_loss=0.01166, ecapa_loss=0.0001562, whisper_loss=0.0669, over 14131.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001538, whisper_loss=0.09052, over 3885932.39 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:43:47,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3035850.0, ans=0.125 2024-08-15 05:43:47,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-15 05:43:52,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3035850.0, ans=0.0 2024-08-15 05:43:59,886 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 05:44:49,902 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 05:45:00,475 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13800, loss[loss=0.1177, beats_loss=0.01047, ecapa_loss=0.0001423, whisper_loss=0.1058, over 21282.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001538, whisper_loss=0.09055, over 3878544.57 frames. ], batch size: 82, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:45:00,705 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 05:45:05,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3036350.0, ans=0.0 2024-08-15 05:45:09,277 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 05:46:03,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3036650.0, ans=0.2 2024-08-15 05:46:05,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.299e+01 2.505e+01 2.770e+01 3.939e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-15 05:46:22,224 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13850, loss[loss=0.09705, beats_loss=0.009614, ecapa_loss=0.0001673, whisper_loss=0.08576, over 20119.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001526, whisper_loss=0.09054, over 3885983.88 frames. ], batch size: 83, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:46:28,072 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-15 05:46:33,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2024-08-15 05:46:35,270 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 05:46:47,311 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 05:46:51,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3036950.0, ans=0.125 2024-08-15 05:46:56,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3037050.0, ans=0.125 2024-08-15 05:46:56,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3037050.0, ans=0.125 2024-08-15 05:47:01,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3037050.0, ans=0.125 2024-08-15 05:47:09,596 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 05:47:35,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3037250.0, ans=0.0 2024-08-15 05:47:36,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-08-15 05:47:40,403 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 05:47:40,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3037250.0, ans=0.125 2024-08-15 05:47:42,804 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13900, loss[loss=0.0974, beats_loss=0.0122, ecapa_loss=0.0001519, whisper_loss=0.08369, over 23638.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001527, whisper_loss=0.0904, over 3926132.46 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:47:45,371 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 05:48:03,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-15 05:48:25,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3037550.0, ans=0.0 2024-08-15 05:48:43,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3037650.0, ans=0.04949747468305833 2024-08-15 05:48:48,731 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=12.0 2024-08-15 05:48:49,086 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05059259384870529, model_norm_threshold=50.10878372192383 2024-08-15 05:48:49,254 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.601e+05, grad_sumsq=3.601e+05, orig_rms_sq=1.000e+00 2024-08-15 05:48:51,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.379e+01 2.607e+01 2.936e+01 9.904e+02, threshold=5.213e+01, percent-clipped=4.0 2024-08-15 05:49:04,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3037750.0, ans=0.0 2024-08-15 05:49:06,870 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 13950, loss[loss=0.09959, beats_loss=0.01083, ecapa_loss=0.0001247, whisper_loss=0.08751, over 16679.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001516, whisper_loss=0.09039, over 3910131.69 frames. ], batch size: 65, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:49:08,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-15 05:49:41,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-15 05:49:52,644 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-15 05:50:00,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3038150.0, ans=0.5 2024-08-15 05:50:16,450 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:50:22,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3038250.0, ans=0.0 2024-08-15 05:50:44,593 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14000, loss[loss=0.1095, beats_loss=0.01097, ecapa_loss=0.0001103, whisper_loss=0.0974, over 19875.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001509, whisper_loss=0.09029, over 3903309.49 frames. ], batch size: 73, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:50:50,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3038350.0, ans=0.05 2024-08-15 05:50:52,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3038350.0, ans=0.0 2024-08-15 05:51:07,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3038450.0, ans=0.0 2024-08-15 05:51:11,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3038450.0, ans=0.0 2024-08-15 05:51:19,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3038450.0, ans=0.125 2024-08-15 05:51:37,787 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 05:51:49,141 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 05:51:58,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3038650.0, ans=0.125 2024-08-15 05:52:11,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.343e+01 2.615e+01 2.930e+01 6.184e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-15 05:52:29,232 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 05:52:29,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3038750.0, ans=0.125 2024-08-15 05:52:35,277 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14050, loss[loss=0.09931, beats_loss=0.009164, ecapa_loss=0.0001307, whisper_loss=0.08884, over 19698.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001513, whisper_loss=0.09067, over 3862245.36 frames. ], batch size: 71, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:52:36,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3038850.0, ans=0.1 2024-08-15 05:53:04,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3038950.0, ans=0.125 2024-08-15 05:53:08,801 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 05:53:19,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3039050.0, ans=0.2 2024-08-15 05:53:25,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3039050.0, ans=0.125 2024-08-15 05:53:32,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3039050.0, ans=0.0 2024-08-15 05:53:42,819 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 34 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 05:53:52,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2024-08-15 05:53:55,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3039250.0, ans=0.0 2024-08-15 05:54:13,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3039250.0, ans=0.0 2024-08-15 05:54:18,251 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14100, loss[loss=0.09436, beats_loss=0.01018, ecapa_loss=0.0001285, whisper_loss=0.08289, over 15667.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001499, whisper_loss=0.09126, over 3899316.44 frames. ], batch size: 59, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:54:50,645 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 37 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 05:55:00,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3039550.0, ans=0.125 2024-08-15 05:55:01,783 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 17 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 05:55:16,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-08-15 05:55:21,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3039650.0, ans=0.0 2024-08-15 05:55:25,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3039650.0, ans=0.125 2024-08-15 05:55:27,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.350e+01 2.664e+01 3.020e+01 1.564e+02, threshold=5.328e+01, percent-clipped=1.0 2024-08-15 05:55:41,443 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14150, loss[loss=0.08865, beats_loss=0.01145, ecapa_loss=0.0001426, whisper_loss=0.07577, over 16809.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001498, whisper_loss=0.09101, over 3910664.92 frames. ], batch size: 67, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:55:43,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3039850.0, ans=0.0 2024-08-15 05:55:43,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2024-08-15 05:56:00,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3039950.0, ans=0.125 2024-08-15 05:56:21,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=12.0 2024-08-15 05:56:29,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3040150.0, ans=0.07 2024-08-15 05:56:36,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3040150.0, ans=0.0 2024-08-15 05:56:39,257 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 31 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-15 05:56:40,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3040150.0, ans=0.1 2024-08-15 05:56:46,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3040250.0, ans=0.125 2024-08-15 05:56:58,511 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14200, loss[loss=0.111, beats_loss=0.01031, ecapa_loss=0.0001722, whisper_loss=0.09895, over 20670.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01052, ecapa_loss=0.00015, whisper_loss=0.09184, over 3894317.53 frames. ], batch size: 83, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:57:04,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3040350.0, ans=0.0 2024-08-15 05:57:04,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3040350.0, ans=0.125 2024-08-15 05:57:10,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3040350.0, ans=0.0 2024-08-15 05:57:14,923 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 05:57:30,398 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 05:58:00,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.326e+01 2.592e+01 2.924e+01 6.304e+01, threshold=5.183e+01, percent-clipped=1.0 2024-08-15 05:58:13,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3040750.0, ans=0.1 2024-08-15 05:58:15,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14250, loss[loss=0.1075, beats_loss=0.009488, ecapa_loss=0.0001586, whisper_loss=0.0964, over 23240.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001499, whisper_loss=0.09116, over 3899933.24 frames. ], batch size: 92, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:58:26,148 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 05:58:27,812 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 18 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 05:58:28,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3040850.0, ans=0.0 2024-08-15 05:58:47,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3041050.0, ans=0.125 2024-08-15 05:58:48,811 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 05:58:50,115 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 05:59:36,772 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14300, loss[loss=0.0963, beats_loss=0.01098, ecapa_loss=0.0001292, whisper_loss=0.08402, over 20309.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001497, whisper_loss=0.08974, over 3885131.73 frames. ], batch size: 80, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:59:38,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3041350.0, ans=0.0 2024-08-15 06:00:16,271 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 06:00:32,463 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.646e+01 2024-08-15 06:00:34,862 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 06:00:36,731 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 06:00:44,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3041750.0, ans=0.125 2024-08-15 06:00:44,980 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.480e+01 2.675e+01 2.988e+01 3.150e+02, threshold=5.350e+01, percent-clipped=2.0 2024-08-15 06:00:45,235 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 06:00:54,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3041750.0, ans=0.125 2024-08-15 06:00:58,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3041750.0, ans=0.125 2024-08-15 06:01:01,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14350, loss[loss=0.1098, beats_loss=0.009725, ecapa_loss=0.0001248, whisper_loss=0.0988, over 19351.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001502, whisper_loss=0.08996, over 3926629.09 frames. ], batch size: 75, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:01:10,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3041850.0, ans=0.0 2024-08-15 06:01:18,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3041950.0, ans=0.125 2024-08-15 06:02:09,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3042250.0, ans=0.1 2024-08-15 06:02:18,026 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-15 06:02:19,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14400, loss[loss=0.08008, beats_loss=0.0112, ecapa_loss=0.0002127, whisper_loss=0.06675, over 18259.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001517, whisper_loss=0.09031, over 3926009.35 frames. ], batch size: 82, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:02:23,458 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 06:02:26,462 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 06:02:31,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-15 06:02:35,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3042450.0, ans=0.125 2024-08-15 06:02:37,677 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 06:03:14,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3042650.0, ans=0.015 2024-08-15 06:03:23,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.353e+01 2.673e+01 3.020e+01 3.990e+01, threshold=5.347e+01, percent-clipped=0.0 2024-08-15 06:03:31,527 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 06:03:40,349 INFO [train_multi_KD3.py:1116] (2/4) Epoch 21, batch 14450, loss[loss=0.08705, beats_loss=0.01208, ecapa_loss=0.000138, whisper_loss=0.07359, over 15506.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001513, whisper_loss=0.09102, over 3930213.62 frames. ], batch size: 63, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:03:46,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3042850.0, ans=0.125 2024-08-15 06:04:05,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3042950.0, ans=0.2 2024-08-15 06:04:14,501 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 06:04:23,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3043050.0, ans=0.125 2024-08-15 06:04:30,620 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 06:04:30,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3043150.0, ans=0.125 2024-08-15 06:04:43,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3043150.0, ans=0.07 2024-08-15 06:04:43,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3043150.0, ans=0.125 2024-08-15 06:05:22,776 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 0, loss[loss=0.07919, beats_loss=0.01059, ecapa_loss=0.0001458, whisper_loss=0.06715, over 17695.00 frames. ], tot_loss[loss=0.07919, beats_loss=0.01059, ecapa_loss=0.0001458, whisper_loss=0.06715, over 17695.00 frames. ], batch size: 71, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:05:22,776 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 06:06:01,362 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005383, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 06:06:18,409 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on SV_voxceleb1: loss=0.004241, beats_loss=0, ecapa_loss=0.0004241, whisper_loss=0, over 939242.00 frames. 2024-08-15 06:08:04,776 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 06:08:04,779 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 06:08:45,684 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 06:08:52,561 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 06:09:18,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3043570.0, ans=0.2 2024-08-15 06:09:31,692 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 06:10:00,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.592e+01 2.838e+01 3.156e+01 2.932e+02, threshold=5.677e+01, percent-clipped=2.0 2024-08-15 06:10:05,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 50, loss[loss=0.1018, beats_loss=0.01014, ecapa_loss=0.0001489, whisper_loss=0.09015, over 17447.00 frames. ], tot_loss[loss=0.09911, beats_loss=0.00971, ecapa_loss=0.0001543, whisper_loss=0.08786, over 861831.86 frames. ], batch size: 65, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:10:18,918 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 06:10:23,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3043770.0, ans=0.125 2024-08-15 06:10:28,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3043870.0, ans=0.0 2024-08-15 06:10:49,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3043970.0, ans=0.125 2024-08-15 06:11:12,941 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.43 vs. limit=22.5 2024-08-15 06:11:19,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.44 vs. limit=22.5 2024-08-15 06:11:25,447 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 06:11:31,049 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-15 06:11:37,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3044170.0, ans=0.0 2024-08-15 06:11:56,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3044270.0, ans=0.07 2024-08-15 06:11:57,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 100, loss[loss=0.1133, beats_loss=0.009123, ecapa_loss=0.0001138, whisper_loss=0.103, over 17315.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.009604, ecapa_loss=0.0001514, whisper_loss=0.08963, over 1545676.94 frames. ], batch size: 64, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:12:11,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3044270.0, ans=10.0 2024-08-15 06:12:20,797 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 06:12:33,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-15 06:12:48,643 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 06:12:48,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3044470.0, ans=0.1 2024-08-15 06:13:17,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3044570.0, ans=0.125 2024-08-15 06:13:36,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3044670.0, ans=0.0 2024-08-15 06:13:49,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.691e+01 2.918e+01 3.263e+01 8.817e+01, threshold=5.837e+01, percent-clipped=1.0 2024-08-15 06:13:54,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 150, loss[loss=0.1082, beats_loss=0.009398, ecapa_loss=0.0001733, whisper_loss=0.09706, over 22505.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009628, ecapa_loss=0.0001508, whisper_loss=0.09026, over 2048929.91 frames. ], batch size: 93, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:14:17,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3044870.0, ans=0.1 2024-08-15 06:14:17,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3044870.0, ans=0.125 2024-08-15 06:15:20,688 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-15 06:15:24,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-15 06:15:28,616 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 200, loss[loss=0.07921, beats_loss=0.01029, ecapa_loss=0.0001621, whisper_loss=0.0673, over 21312.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009827, ecapa_loss=0.0001514, whisper_loss=0.09055, over 2439855.98 frames. ], batch size: 87, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:15:31,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3045270.0, ans=0.125 2024-08-15 06:15:35,919 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 06:15:39,298 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.385e+01 2024-08-15 06:15:40,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3045270.0, ans=0.0 2024-08-15 06:15:45,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3045370.0, ans=0.0 2024-08-15 06:15:52,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3045370.0, ans=12.0 2024-08-15 06:15:58,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3045370.0, ans=0.1 2024-08-15 06:15:58,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3045370.0, ans=0.0 2024-08-15 06:16:03,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=12.0 2024-08-15 06:16:07,351 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 06:16:14,021 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 06:16:29,228 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-15 06:16:32,262 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 06:16:40,444 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 06:16:41,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2024-08-15 06:16:44,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.328e+01 2.566e+01 2.862e+01 5.342e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-15 06:16:47,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 250, loss[loss=0.1034, beats_loss=0.007795, ecapa_loss=0.0001418, whisper_loss=0.09422, over 14849.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01004, ecapa_loss=0.0001497, whisper_loss=0.09052, over 2747573.08 frames. ], batch size: 54, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:16:47,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3045770.0, ans=0.0 2024-08-15 06:17:07,036 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-15 06:17:07,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3045870.0, ans=0.125 2024-08-15 06:17:17,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3045970.0, ans=0.125 2024-08-15 06:17:25,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3045970.0, ans=0.125 2024-08-15 06:17:38,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3046070.0, ans=0.0 2024-08-15 06:17:55,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3046170.0, ans=0.125 2024-08-15 06:18:05,563 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 300, loss[loss=0.0971, beats_loss=0.01089, ecapa_loss=0.0001686, whisper_loss=0.08453, over 22539.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01011, ecapa_loss=0.0001512, whisper_loss=0.09088, over 2952312.03 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:18:05,736 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 06:18:09,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3046270.0, ans=15.0 2024-08-15 06:18:15,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046270.0, ans=0.1 2024-08-15 06:18:28,776 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 06:18:36,634 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 06:18:37,992 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 06:18:50,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-15 06:19:00,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3046570.0, ans=0.125 2024-08-15 06:19:04,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046570.0, ans=0.1 2024-08-15 06:19:19,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.288e+01 2.599e+01 2.904e+01 1.999e+02, threshold=5.198e+01, percent-clipped=4.0 2024-08-15 06:19:22,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 350, loss[loss=0.09892, beats_loss=0.01291, ecapa_loss=0.0001312, whisper_loss=0.0847, over 22703.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0103, ecapa_loss=0.0001519, whisper_loss=0.0894, over 3151907.82 frames. ], batch size: 91, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:19:42,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046870.0, ans=0.1 2024-08-15 06:20:04,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3046970.0, ans=0.125 2024-08-15 06:20:19,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3047070.0, ans=0.0 2024-08-15 06:20:34,684 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 34 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 06:20:38,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3047170.0, ans=0.0 2024-08-15 06:20:39,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3047270.0, ans=0.1 2024-08-15 06:20:40,470 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 400, loss[loss=0.09813, beats_loss=0.01078, ecapa_loss=0.0001613, whisper_loss=0.08574, over 17842.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001498, whisper_loss=0.08936, over 3288585.99 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:20:41,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3047270.0, ans=0.2 2024-08-15 06:20:51,684 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 06:20:52,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3047270.0, ans=0.125 2024-08-15 06:20:57,939 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 32 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 06:21:01,244 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-15 06:21:12,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3047470.0, ans=0.2 2024-08-15 06:21:14,677 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 06:21:19,774 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:21:25,256 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 06:21:33,304 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 06:21:33,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3047570.0, ans=0.2 2024-08-15 06:21:36,290 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 06:21:44,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3047670.0, ans=0.2 2024-08-15 06:21:54,277 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.307e+01 2.559e+01 2.888e+01 1.580e+02, threshold=5.118e+01, percent-clipped=5.0 2024-08-15 06:21:57,098 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 450, loss[loss=0.09281, beats_loss=0.01031, ecapa_loss=0.0001583, whisper_loss=0.08092, over 23452.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001506, whisper_loss=0.08925, over 3433623.23 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:22:15,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=12.0 2024-08-15 06:22:23,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3047870.0, ans=0.125 2024-08-15 06:22:48,376 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 10 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 06:22:50,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3048070.0, ans=0.0 2024-08-15 06:22:51,274 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 06:22:52,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048070.0, ans=0.1 2024-08-15 06:23:02,266 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 06:23:13,393 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 06:23:14,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 500, loss[loss=0.08583, beats_loss=0.009185, ecapa_loss=0.0001552, whisper_loss=0.0751, over 15435.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.000151, whisper_loss=0.08922, over 3514974.39 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:23:14,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3048270.0, ans=0.0 2024-08-15 06:23:19,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048270.0, ans=0.1 2024-08-15 06:23:34,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3048370.0, ans=0.0 2024-08-15 06:23:36,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3048370.0, ans=0.0 2024-08-15 06:23:40,251 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 06:23:43,160 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 06:23:48,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-15 06:23:53,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-15 06:23:59,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048570.0, ans=0.125 2024-08-15 06:24:05,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-15 06:24:28,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.268e+01 2.600e+01 2.909e+01 8.676e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-15 06:24:28,855 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 06:24:31,577 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 550, loss[loss=0.08787, beats_loss=0.01386, ecapa_loss=0.0001394, whisper_loss=0.07262, over 18705.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001495, whisper_loss=0.08996, over 3601970.68 frames. ], batch size: 77, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:24:40,718 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-15 06:25:04,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3048970.0, ans=0.125 2024-08-15 06:25:22,190 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 06:25:24,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3049070.0, ans=0.2 2024-08-15 06:25:36,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3049170.0, ans=0.125 2024-08-15 06:25:41,081 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 06:25:41,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049170.0, ans=0.1 2024-08-15 06:25:48,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 600, loss[loss=0.1018, beats_loss=0.01089, ecapa_loss=0.0001761, whisper_loss=0.08912, over 21821.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001485, whisper_loss=0.08947, over 3642888.21 frames. ], batch size: 91, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:25:56,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-15 06:25:56,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2024-08-15 06:26:03,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3049370.0, ans=0.0 2024-08-15 06:26:04,543 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 14 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 06:26:10,509 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 06:26:23,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3049470.0, ans=0.0 2024-08-15 06:26:24,300 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 06:26:32,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3049470.0, ans=0.0 2024-08-15 06:26:37,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3049570.0, ans=0.125 2024-08-15 06:26:41,711 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 06:26:56,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3049670.0, ans=0.125 2024-08-15 06:26:59,388 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 23 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 06:27:03,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.381e+01 2.532e+01 2.729e+01 4.299e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 06:27:04,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3049670.0, ans=0.125 2024-08-15 06:27:07,446 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 650, loss[loss=0.1065, beats_loss=0.007275, ecapa_loss=0.0002503, whisper_loss=0.09676, over 12941.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.000149, whisper_loss=0.08966, over 3670491.87 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:27:12,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3049770.0, ans=0.125 2024-08-15 06:27:33,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3049870.0, ans=0.125 2024-08-15 06:27:38,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3049970.0, ans=0.125 2024-08-15 06:28:10,543 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 06:28:10,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3050170.0, ans=0.125 2024-08-15 06:28:23,810 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 700, loss[loss=0.104, beats_loss=0.009396, ecapa_loss=0.0001444, whisper_loss=0.09319, over 22948.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001494, whisper_loss=0.08898, over 3680542.90 frames. ], batch size: 90, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:28:23,946 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 06:28:27,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3050270.0, ans=0.07 2024-08-15 06:28:38,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3050370.0, ans=0.125 2024-08-15 06:29:14,673 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 38 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 06:29:28,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3050670.0, ans=0.0 2024-08-15 06:29:30,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3050670.0, ans=0.125 2024-08-15 06:29:32,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3050670.0, ans=0.05 2024-08-15 06:29:34,372 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 06:29:38,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.268e+01 2.485e+01 2.982e+01 6.162e+01, threshold=4.969e+01, percent-clipped=2.0 2024-08-15 06:29:41,992 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 750, loss[loss=0.1143, beats_loss=0.008249, ecapa_loss=0.0001611, whisper_loss=0.1044, over 17675.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01053, ecapa_loss=0.0001489, whisper_loss=0.08887, over 3701089.86 frames. ], batch size: 67, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:30:11,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-15 06:30:14,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-15 06:30:57,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 800, loss[loss=0.104, beats_loss=0.00766, ecapa_loss=0.0001929, whisper_loss=0.09441, over 17586.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.08877, over 3742336.48 frames. ], batch size: 73, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:31:10,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3051270.0, ans=0.2 2024-08-15 06:31:22,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3051370.0, ans=0.015 2024-08-15 06:31:29,792 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 06:31:38,133 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 12 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 06:32:00,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3051670.0, ans=0.1 2024-08-15 06:32:11,052 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:32:11,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-15 06:32:11,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.305e+01 2.508e+01 2.943e+01 4.012e+02, threshold=5.016e+01, percent-clipped=1.0 2024-08-15 06:32:14,932 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 850, loss[loss=0.08385, beats_loss=0.01007, ecapa_loss=0.0001722, whisper_loss=0.07206, over 20739.00 frames. ], tot_loss[loss=0.09975, beats_loss=0.01061, ecapa_loss=0.000148, whisper_loss=0.08766, over 3736108.79 frames. ], batch size: 86, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:32:25,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3051770.0, ans=0.125 2024-08-15 06:32:27,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3051770.0, ans=10.0 2024-08-15 06:32:27,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3051770.0, ans=0.2 2024-08-15 06:32:36,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051870.0, ans=0.1 2024-08-15 06:32:42,020 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 06:33:07,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-15 06:33:33,835 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 900, loss[loss=0.105, beats_loss=0.0102, ecapa_loss=0.000169, whisper_loss=0.09314, over 22434.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.01057, ecapa_loss=0.000148, whisper_loss=0.08787, over 3763631.06 frames. ], batch size: 91, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:33:34,076 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 06:33:39,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3052270.0, ans=0.125 2024-08-15 06:33:48,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3052370.0, ans=0.125 2024-08-15 06:34:09,274 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 06:34:20,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3052570.0, ans=0.0 2024-08-15 06:34:37,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3052670.0, ans=0.0 2024-08-15 06:34:37,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-15 06:34:47,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.568e+01 3.064e+01 1.106e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-15 06:34:50,214 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 950, loss[loss=0.1066, beats_loss=0.01043, ecapa_loss=0.0001472, whisper_loss=0.09469, over 18067.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.08787, over 3772430.14 frames. ], batch size: 71, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:35:16,477 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 06:35:36,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3053070.0, ans=0.1 2024-08-15 06:35:39,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3053070.0, ans=0.0 2024-08-15 06:35:41,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3053070.0, ans=0.0 2024-08-15 06:35:50,224 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 06:36:08,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1000, loss[loss=0.1259, beats_loss=0.0072, ecapa_loss=0.0001574, whisper_loss=0.1172, over 17012.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01062, ecapa_loss=0.0001476, whisper_loss=0.08786, over 3785059.32 frames. ], batch size: 63, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:36:08,759 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 06:36:15,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=12.0 2024-08-15 06:36:19,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-08-15 06:36:26,751 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-15 06:36:32,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-08-15 06:36:48,233 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-15 06:37:00,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3053570.0, ans=0.125 2024-08-15 06:37:11,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3053670.0, ans=0.125 2024-08-15 06:37:15,791 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 06:37:23,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.275e+01 2.548e+01 2.900e+01 4.496e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-15 06:37:26,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1050, loss[loss=0.0647, beats_loss=0.01273, ecapa_loss=0.0001425, whisper_loss=0.05054, over 19353.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01059, ecapa_loss=0.0001475, whisper_loss=0.08836, over 3831664.82 frames. ], batch size: 83, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:37:37,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3053770.0, ans=0.2 2024-08-15 06:37:40,252 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 06:37:41,385 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08964061737060547, model_norm_threshold=50.96524429321289 2024-08-15 06:37:41,553 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.115e+05, grad_sumsq=1.116e+07, orig_rms_sq=9.994e-03 2024-08-15 06:37:48,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3053870.0, ans=0.125 2024-08-15 06:37:54,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3053870.0, ans=0.125 2024-08-15 06:38:06,721 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 06:38:13,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3054070.0, ans=0.1 2024-08-15 06:38:19,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-15 06:38:31,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3054170.0, ans=0.2 2024-08-15 06:38:43,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3054270.0, ans=15.0 2024-08-15 06:38:43,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1100, loss[loss=0.083, beats_loss=0.01348, ecapa_loss=0.000117, whisper_loss=0.06835, over 16968.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01059, ecapa_loss=0.0001471, whisper_loss=0.08874, over 3835671.73 frames. ], batch size: 69, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:38:44,746 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-15 06:39:04,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=12.0 2024-08-15 06:39:07,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2024-08-15 06:39:09,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3054370.0, ans=0.0 2024-08-15 06:39:57,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3054570.0, ans=0.0 2024-08-15 06:40:32,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.402e+01 2.652e+01 3.039e+01 5.686e+02, threshold=5.304e+01, percent-clipped=1.0 2024-08-15 06:40:32,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3054670.0, ans=10.0 2024-08-15 06:40:34,926 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1150, loss[loss=0.1296, beats_loss=0.009234, ecapa_loss=0.0001649, whisper_loss=0.1187, over 18845.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001473, whisper_loss=0.08887, over 3835748.38 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:40:42,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3054770.0, ans=0.125 2024-08-15 06:40:44,319 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 06:41:02,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3054870.0, ans=0.0 2024-08-15 06:41:08,579 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 06:41:32,293 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 06:41:34,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3055070.0, ans=0.2 2024-08-15 06:41:49,059 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 06:42:02,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3055270.0, ans=0.125 2024-08-15 06:42:03,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1200, loss[loss=0.1244, beats_loss=0.00729, ecapa_loss=0.0001401, whisper_loss=0.1157, over 17304.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01062, ecapa_loss=0.0001482, whisper_loss=0.08857, over 3838114.12 frames. ], batch size: 64, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:42:19,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3055270.0, ans=0.5 2024-08-15 06:42:19,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-15 06:42:40,308 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 06:42:52,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3055470.0, ans=0.1 2024-08-15 06:43:03,007 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 06:43:06,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3055470.0, ans=0.0 2024-08-15 06:43:33,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3055670.0, ans=0.125 2024-08-15 06:43:43,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.260e+01 2.457e+01 2.910e+01 3.777e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-15 06:43:47,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2024-08-15 06:43:48,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1250, loss[loss=0.09359, beats_loss=0.008079, ecapa_loss=0.0001545, whisper_loss=0.08397, over 19579.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01067, ecapa_loss=0.0001481, whisper_loss=0.08783, over 3828019.11 frames. ], batch size: 73, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:44:14,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3055870.0, ans=0.125 2024-08-15 06:44:30,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3055870.0, ans=0.125 2024-08-15 06:44:43,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3055970.0, ans=0.125 2024-08-15 06:44:46,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3055970.0, ans=0.1 2024-08-15 06:44:50,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2024-08-15 06:44:57,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3055970.0, ans=0.0 2024-08-15 06:45:17,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3056070.0, ans=0.125 2024-08-15 06:45:53,418 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1300, loss[loss=0.1168, beats_loss=0.008853, ecapa_loss=0.0001636, whisper_loss=0.1063, over 20459.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01072, ecapa_loss=0.0001481, whisper_loss=0.08806, over 3850201.53 frames. ], batch size: 84, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:46:01,751 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 06:46:09,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3056270.0, ans=0.125 2024-08-15 06:46:16,948 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 06:46:19,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-15 06:46:44,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3056470.0, ans=0.0 2024-08-15 06:46:48,910 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 06:46:52,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3056470.0, ans=0.0 2024-08-15 06:47:02,489 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 06:47:51,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.277e+01 2.465e+01 2.867e+01 3.912e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-15 06:47:55,990 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1350, loss[loss=0.08823, beats_loss=0.01189, ecapa_loss=0.0001478, whisper_loss=0.07487, over 20659.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01071, ecapa_loss=0.0001477, whisper_loss=0.08885, over 3860323.40 frames. ], batch size: 86, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:47:58,857 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 06:48:00,313 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 16 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-15 06:48:14,649 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 06:48:35,768 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 06:49:01,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3056970.0, ans=0.125 2024-08-15 06:49:28,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3057070.0, ans=0.0 2024-08-15 06:49:34,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3057170.0, ans=0.0 2024-08-15 06:49:42,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2024-08-15 06:49:43,702 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 7 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 06:49:48,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1400, loss[loss=0.1131, beats_loss=0.007711, ecapa_loss=0.000179, whisper_loss=0.1036, over 20836.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01068, ecapa_loss=0.0001477, whisper_loss=0.0882, over 3842944.08 frames. ], batch size: 83, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:49:55,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-15 06:50:01,474 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-15 06:50:15,821 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 06:50:21,017 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-15 06:50:29,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3057470.0, ans=0.1 2024-08-15 06:50:37,696 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-15 06:50:52,287 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 25 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 06:50:53,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3057570.0, ans=0.1 2024-08-15 06:51:07,251 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 06:51:09,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3057670.0, ans=0.125 2024-08-15 06:51:13,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.207e+01 2.496e+01 2.856e+01 4.886e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 06:51:56,674 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1450, loss[loss=0.1208, beats_loss=0.007372, ecapa_loss=0.0001625, whisper_loss=0.1118, over 17911.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01064, ecapa_loss=0.0001469, whisper_loss=0.08816, over 3824436.93 frames. ], batch size: 68, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:52:33,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2024-08-15 06:52:41,766 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 06:53:14,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3058170.0, ans=0.0 2024-08-15 06:53:17,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058170.0, ans=0.1 2024-08-15 06:53:28,630 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1500, loss[loss=0.09549, beats_loss=0.01235, ecapa_loss=9.916e-05, whisper_loss=0.08215, over 22952.00 frames. ], tot_loss[loss=0.09981, beats_loss=0.0107, ecapa_loss=0.0001471, whisper_loss=0.08764, over 3841159.50 frames. ], batch size: 86, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:53:29,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3058270.0, ans=0.125 2024-08-15 06:54:00,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-15 06:54:09,613 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 06:54:09,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3058470.0, ans=0.0 2024-08-15 06:54:36,831 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 06:54:37,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058570.0, ans=0.1 2024-08-15 06:54:58,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.187e+01 2.410e+01 2.690e+01 4.725e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-15 06:54:58,854 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 06:55:02,243 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1550, loss[loss=0.1284, beats_loss=0.01061, ecapa_loss=0.0001359, whisper_loss=0.1165, over 20036.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01065, ecapa_loss=0.0001472, whisper_loss=0.08839, over 3826120.31 frames. ], batch size: 73, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:55:18,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3058770.0, ans=0.125 2024-08-15 06:55:28,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3058870.0, ans=0.125 2024-08-15 06:55:45,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058970.0, ans=0.1 2024-08-15 06:55:45,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3058970.0, ans=0.04949747468305833 2024-08-15 06:55:48,415 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 06:56:24,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3059170.0, ans=0.2 2024-08-15 06:56:26,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3059170.0, ans=0.0 2024-08-15 06:56:32,667 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1600, loss[loss=0.1144, beats_loss=0.008979, ecapa_loss=0.0001599, whisper_loss=0.1038, over 17831.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001483, whisper_loss=0.08907, over 3818183.88 frames. ], batch size: 69, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:56:38,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3059270.0, ans=15.0 2024-08-15 06:56:48,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3059270.0, ans=0.125 2024-08-15 06:56:53,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3059370.0, ans=0.07 2024-08-15 06:56:55,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3059370.0, ans=0.125 2024-08-15 06:57:02,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059370.0, ans=0.1 2024-08-15 06:57:06,585 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 06:57:07,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3059470.0, ans=0.125 2024-08-15 06:57:33,815 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 34 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-15 06:57:41,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3059570.0, ans=0.125 2024-08-15 06:57:59,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.263e+01 2.473e+01 2.776e+01 4.144e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-15 06:58:02,336 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1650, loss[loss=0.1293, beats_loss=0.007674, ecapa_loss=0.0001755, whisper_loss=0.1198, over 23134.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001471, whisper_loss=0.08983, over 3838723.20 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:58:08,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-15 06:58:18,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3059770.0, ans=0.0 2024-08-15 06:58:30,561 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 06:58:33,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.81 vs. limit=10.0 2024-08-15 06:58:46,753 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 15 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 06:59:16,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3060170.0, ans=0.1 2024-08-15 06:59:23,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3060170.0, ans=22.5 2024-08-15 06:59:24,618 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-15 06:59:30,753 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1700, loss[loss=0.08621, beats_loss=0.009207, ecapa_loss=0.0001393, whisper_loss=0.07561, over 18792.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.000147, whisper_loss=0.09015, over 3863286.97 frames. ], batch size: 70, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:59:34,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3060270.0, ans=0.125 2024-08-15 06:59:36,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3060270.0, ans=0.125 2024-08-15 06:59:39,779 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 06:59:40,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3060270.0, ans=0.125 2024-08-15 06:59:41,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3060270.0, ans=0.125 2024-08-15 06:59:55,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2024-08-15 07:00:04,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3060470.0, ans=0.125 2024-08-15 07:00:29,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3060570.0, ans=0.2 2024-08-15 07:00:33,408 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 07:00:36,613 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-15 07:00:51,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.353e+01 2.609e+01 2.862e+01 3.979e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-15 07:00:54,838 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1750, loss[loss=0.1001, beats_loss=0.01223, ecapa_loss=0.000125, whisper_loss=0.08664, over 18169.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001468, whisper_loss=0.09069, over 3868526.31 frames. ], batch size: 74, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:00:58,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3060770.0, ans=0.0 2024-08-15 07:01:15,022 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 07:01:29,791 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 07:01:37,886 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 07:01:42,815 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 07:01:43,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3061070.0, ans=0.0 2024-08-15 07:01:49,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3061070.0, ans=0.05 2024-08-15 07:01:52,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3061070.0, ans=0.125 2024-08-15 07:02:06,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3061170.0, ans=0.125 2024-08-15 07:02:09,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3061170.0, ans=0.0 2024-08-15 07:02:15,300 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1800, loss[loss=0.1016, beats_loss=0.01099, ecapa_loss=0.0001194, whisper_loss=0.08942, over 23645.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001468, whisper_loss=0.09022, over 3875945.08 frames. ], batch size: 89, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:02:40,209 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 33 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 07:02:59,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3061470.0, ans=0.0 2024-08-15 07:03:00,438 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 07:03:06,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3061570.0, ans=0.1 2024-08-15 07:03:20,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3061670.0, ans=0.125 2024-08-15 07:03:31,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.281e+01 2.524e+01 2.715e+01 4.496e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-15 07:03:35,171 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1850, loss[loss=0.09391, beats_loss=0.01163, ecapa_loss=0.0001248, whisper_loss=0.08103, over 17535.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001479, whisper_loss=0.09081, over 3879181.96 frames. ], batch size: 68, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:04:10,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-15 07:04:41,770 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 07:04:43,308 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 07:04:45,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3062170.0, ans=0.5 2024-08-15 07:04:49,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-08-15 07:04:55,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1900, loss[loss=0.1018, beats_loss=0.01264, ecapa_loss=0.0001213, whisper_loss=0.08793, over 23324.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001486, whisper_loss=0.09056, over 3867096.13 frames. ], batch size: 93, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:05:00,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3062270.0, ans=0.0 2024-08-15 07:05:07,621 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 07:05:19,098 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 07:05:42,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062470.0, ans=0.1 2024-08-15 07:05:45,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3062570.0, ans=0.125 2024-08-15 07:05:48,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3062570.0, ans=0.0 2024-08-15 07:05:48,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-15 07:05:58,824 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-15 07:06:03,034 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 07:06:08,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-15 07:06:12,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.578e+01 2.950e+01 1.570e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-15 07:06:15,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 1950, loss[loss=0.1071, beats_loss=0.01175, ecapa_loss=0.0001217, whisper_loss=0.09417, over 22863.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001473, whisper_loss=0.09017, over 3837905.04 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:06:33,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3062870.0, ans=0.0 2024-08-15 07:06:39,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3062870.0, ans=0.05 2024-08-15 07:06:43,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3062870.0, ans=0.0 2024-08-15 07:07:06,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3063070.0, ans=0.0 2024-08-15 07:07:09,995 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 17 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-15 07:07:13,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3063070.0, ans=0.07 2024-08-15 07:07:35,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2000, loss[loss=0.1106, beats_loss=0.007892, ecapa_loss=0.0001741, whisper_loss=0.101, over 18981.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001471, whisper_loss=0.08956, over 3829061.44 frames. ], batch size: 78, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:07:41,386 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 07:08:08,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3063470.0, ans=0.125 2024-08-15 07:08:31,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3063570.0, ans=0.125 2024-08-15 07:08:33,739 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 07:08:39,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3063670.0, ans=0.2 2024-08-15 07:08:51,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.304e+01 2.556e+01 2.865e+01 6.565e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-15 07:08:54,459 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2050, loss[loss=0.1314, beats_loss=0.007998, ecapa_loss=0.0001633, whisper_loss=0.1218, over 18327.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.000147, whisper_loss=0.08991, over 3837837.78 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:08:56,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3063770.0, ans=0.1 2024-08-15 07:08:56,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3063770.0, ans=0.1 2024-08-15 07:09:09,724 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 07:09:11,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3063870.0, ans=0.125 2024-08-15 07:09:14,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3063870.0, ans=0.0 2024-08-15 07:09:17,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3063870.0, ans=0.125 2024-08-15 07:09:34,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3063970.0, ans=0.125 2024-08-15 07:09:48,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3064070.0, ans=0.125 2024-08-15 07:09:48,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3064070.0, ans=0.125 2024-08-15 07:09:59,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3064170.0, ans=0.1 2024-08-15 07:10:00,221 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 07:10:12,789 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2100, loss[loss=0.1086, beats_loss=0.008476, ecapa_loss=0.0001627, whisper_loss=0.09846, over 22570.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001463, whisper_loss=0.09072, over 3848234.21 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:10:21,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3064270.0, ans=0.2 2024-08-15 07:10:32,058 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 07:10:47,896 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 07:11:02,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3064570.0, ans=0.1 2024-08-15 07:11:07,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3064570.0, ans=0.025 2024-08-15 07:11:18,229 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 07:11:28,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.342e+01 2.621e+01 2.964e+01 3.863e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-15 07:11:32,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2150, loss[loss=0.1013, beats_loss=0.01243, ecapa_loss=0.0001256, whisper_loss=0.0876, over 21438.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001455, whisper_loss=0.09073, over 3849971.53 frames. ], batch size: 85, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:11:47,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2024-08-15 07:11:49,637 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 07:11:50,462 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-08-15 07:12:00,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3064870.0, ans=0.125 2024-08-15 07:12:00,508 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.781e+00 2024-08-15 07:12:08,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3064970.0, ans=0.1 2024-08-15 07:12:12,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3064970.0, ans=0.05 2024-08-15 07:12:15,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3064970.0, ans=0.125 2024-08-15 07:12:34,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3065070.0, ans=0.1 2024-08-15 07:12:38,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065170.0, ans=0.1 2024-08-15 07:12:53,952 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2200, loss[loss=0.09018, beats_loss=0.009588, ecapa_loss=0.0001559, whisper_loss=0.07904, over 18388.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001458, whisper_loss=0.09078, over 3836104.30 frames. ], batch size: 74, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:12:59,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-15 07:13:09,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3065370.0, ans=0.125 2024-08-15 07:13:10,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3065370.0, ans=0.2 2024-08-15 07:13:32,098 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 07:13:36,775 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 07:13:54,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2024-08-15 07:13:59,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3065670.0, ans=0.125 2024-08-15 07:14:09,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.287e+01 2.531e+01 2.773e+01 4.088e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 07:14:12,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2250, loss[loss=0.0931, beats_loss=0.01233, ecapa_loss=0.0001652, whisper_loss=0.07912, over 20024.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.000147, whisper_loss=0.09042, over 3850647.39 frames. ], batch size: 84, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:14:23,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-15 07:14:38,908 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 07:14:42,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3065870.0, ans=0.125 2024-08-15 07:14:54,068 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-15 07:15:24,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3066170.0, ans=0.125 2024-08-15 07:15:31,629 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 07:15:34,199 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2300, loss[loss=0.07724, beats_loss=0.00913, ecapa_loss=0.000133, whisper_loss=0.06678, over 16482.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001477, whisper_loss=0.09027, over 3859422.72 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:16:00,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-15 07:16:00,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.97 vs. limit=5.0 2024-08-15 07:16:06,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3066470.0, ans=0.0 2024-08-15 07:16:09,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3066470.0, ans=0.125 2024-08-15 07:16:11,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2024-08-15 07:16:22,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3066570.0, ans=0.07 2024-08-15 07:16:22,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3066570.0, ans=0.0 2024-08-15 07:16:29,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3066570.0, ans=0.125 2024-08-15 07:16:32,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3066570.0, ans=0.125 2024-08-15 07:16:45,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3066670.0, ans=0.125 2024-08-15 07:16:46,136 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 07:16:48,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3066670.0, ans=0.125 2024-08-15 07:16:50,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.352e+01 2.574e+01 2.900e+01 4.946e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-15 07:16:53,860 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2350, loss[loss=0.09658, beats_loss=0.01037, ecapa_loss=0.0001493, whisper_loss=0.08472, over 22326.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001488, whisper_loss=0.09043, over 3882775.67 frames. ], batch size: 91, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:17:03,981 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 07:17:38,957 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 07:17:47,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3067070.0, ans=0.125 2024-08-15 07:17:57,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3067170.0, ans=0.125 2024-08-15 07:17:59,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3067170.0, ans=0.2 2024-08-15 07:18:02,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3067170.0, ans=0.125 2024-08-15 07:18:04,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-08-15 07:18:05,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3067170.0, ans=0.125 2024-08-15 07:18:12,591 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2400, loss[loss=0.1011, beats_loss=0.009426, ecapa_loss=0.0001507, whisper_loss=0.09016, over 23432.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.000149, whisper_loss=0.09134, over 3895433.21 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:18:19,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3067270.0, ans=0.2 2024-08-15 07:18:19,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3067270.0, ans=0.125 2024-08-15 07:18:22,055 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 07:18:26,245 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 07:18:34,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3067370.0, ans=0.125 2024-08-15 07:18:44,420 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 07:18:49,441 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 07:18:56,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3067470.0, ans=0.035 2024-08-15 07:19:08,290 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 07:19:21,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3067670.0, ans=0.125 2024-08-15 07:19:24,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3067670.0, ans=0.0 2024-08-15 07:19:26,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.236e+01 2.451e+01 2.781e+01 1.373e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-15 07:19:27,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3067670.0, ans=0.0 2024-08-15 07:19:29,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2450, loss[loss=0.0824, beats_loss=0.01136, ecapa_loss=0.0001483, whisper_loss=0.06956, over 17429.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001492, whisper_loss=0.0907, over 3870631.60 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:19:33,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.61 vs. limit=22.5 2024-08-15 07:19:43,263 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 07:19:46,410 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 07:20:02,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3067970.0, ans=0.125 2024-08-15 07:20:03,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3067970.0, ans=0.2 2024-08-15 07:20:03,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3067970.0, ans=0.0 2024-08-15 07:20:13,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067970.0, ans=0.1 2024-08-15 07:20:16,742 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 07:20:28,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=12.0 2024-08-15 07:20:30,349 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 07:20:41,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3068170.0, ans=0.125 2024-08-15 07:20:47,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3068270.0, ans=0.2 2024-08-15 07:20:48,716 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2500, loss[loss=0.08594, beats_loss=0.01275, ecapa_loss=0.0001102, whisper_loss=0.07209, over 14554.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001492, whisper_loss=0.09105, over 3881374.19 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:20:50,471 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 07:20:57,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-15 07:20:59,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-15 07:21:07,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3068370.0, ans=10.0 2024-08-15 07:21:11,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3068370.0, ans=0.125 2024-08-15 07:22:04,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.249e+01 2.498e+01 2.918e+01 4.518e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 07:22:05,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3068670.0, ans=0.0 2024-08-15 07:22:07,191 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2550, loss[loss=0.1043, beats_loss=0.01037, ecapa_loss=0.0001547, whisper_loss=0.09237, over 17298.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001492, whisper_loss=0.09142, over 3895319.59 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:22:11,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3068770.0, ans=0.0 2024-08-15 07:22:24,851 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 07:22:37,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3068970.0, ans=0.07 2024-08-15 07:22:40,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3068970.0, ans=0.0 2024-08-15 07:23:24,221 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 07:23:25,424 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2600, loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001186, whisper_loss=0.09123, over 23905.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001486, whisper_loss=0.0906, over 3886928.73 frames. ], batch size: 91, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:23:48,216 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 07:24:00,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3069470.0, ans=0.125 2024-08-15 07:24:13,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3069570.0, ans=0.125 2024-08-15 07:24:19,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069570.0, ans=0.1 2024-08-15 07:24:36,121 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 07:24:40,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.364e+01 2.551e+01 2.920e+01 2.244e+02, threshold=5.103e+01, percent-clipped=2.0 2024-08-15 07:24:42,276 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 07:24:43,302 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2650, loss[loss=0.08808, beats_loss=0.01072, ecapa_loss=0.0001371, whisper_loss=0.07598, over 22139.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001486, whisper_loss=0.09065, over 3899696.87 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:25:01,718 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-15 07:25:03,691 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 07:25:26,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3069970.0, ans=0.125 2024-08-15 07:25:31,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3070070.0, ans=10.0 2024-08-15 07:25:37,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3070070.0, ans=0.02 2024-08-15 07:25:46,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=12.0 2024-08-15 07:25:51,680 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 07:25:54,583 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 07:26:02,131 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2700, loss[loss=0.08881, beats_loss=0.01189, ecapa_loss=0.0001637, whisper_loss=0.07529, over 13807.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001478, whisper_loss=0.08936, over 3875654.15 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:26:04,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070270.0, ans=0.1 2024-08-15 07:26:06,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070270.0, ans=0.1 2024-08-15 07:26:11,035 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-15 07:26:16,993 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 07:26:21,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3070370.0, ans=0.125 2024-08-15 07:26:29,318 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 07:26:50,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3070570.0, ans=0.0 2024-08-15 07:26:53,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3070570.0, ans=0.125 2024-08-15 07:27:09,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.50 vs. limit=22.5 2024-08-15 07:27:15,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3070670.0, ans=0.125 2024-08-15 07:27:17,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-15 07:27:17,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.236e+01 2.522e+01 2.732e+01 2.329e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 07:27:21,387 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2750, loss[loss=0.1039, beats_loss=0.01159, ecapa_loss=0.0001543, whisper_loss=0.09074, over 22680.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.0903, over 3858831.58 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:27:25,113 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 07:27:26,188 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-08-15 07:27:46,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3070870.0, ans=0.2 2024-08-15 07:27:51,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3070970.0, ans=0.125 2024-08-15 07:27:55,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3070970.0, ans=0.035 2024-08-15 07:28:11,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3071070.0, ans=0.125 2024-08-15 07:28:14,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071070.0, ans=0.1 2024-08-15 07:28:24,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-15 07:28:32,485 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 07:28:35,263 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 07:28:39,935 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2800, loss[loss=0.09823, beats_loss=0.01113, ecapa_loss=0.0001775, whisper_loss=0.08533, over 20897.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001483, whisper_loss=0.09044, over 3834537.55 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:28:40,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3071270.0, ans=0.125 2024-08-15 07:28:51,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3071270.0, ans=0.0 2024-08-15 07:28:51,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3071270.0, ans=0.0 2024-08-15 07:29:00,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-15 07:29:11,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071470.0, ans=0.125 2024-08-15 07:29:13,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.17 vs. limit=15.0 2024-08-15 07:29:32,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3071570.0, ans=0.0 2024-08-15 07:29:43,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071670.0, ans=0.125 2024-08-15 07:29:45,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3071670.0, ans=0.1 2024-08-15 07:29:57,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.358e+01 2.558e+01 2.902e+01 4.968e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-15 07:30:00,184 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2850, loss[loss=0.09256, beats_loss=0.01242, ecapa_loss=0.0001217, whisper_loss=0.07892, over 16069.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001474, whisper_loss=0.09099, over 3838332.65 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:30:02,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3071770.0, ans=0.0 2024-08-15 07:30:28,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3071870.0, ans=0.125 2024-08-15 07:30:39,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3071970.0, ans=0.125 2024-08-15 07:31:15,180 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 07:31:17,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3072270.0, ans=0.125 2024-08-15 07:31:18,318 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2900, loss[loss=0.09913, beats_loss=0.01128, ecapa_loss=0.0001644, whisper_loss=0.0862, over 18108.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001495, whisper_loss=0.0909, over 3842325.13 frames. ], batch size: 73, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:31:18,585 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 07:31:49,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3072470.0, ans=0.125 2024-08-15 07:31:49,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-15 07:32:03,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3072470.0, ans=0.1 2024-08-15 07:32:05,978 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 07:32:11,330 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 07:32:24,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3072670.0, ans=0.1 2024-08-15 07:32:32,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.406e+01 2.599e+01 2.781e+01 4.544e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-15 07:32:35,984 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 2950, loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001535, whisper_loss=0.09148, over 16428.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001485, whisper_loss=0.09118, over 3865904.62 frames. ], batch size: 63, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:32:54,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3072870.0, ans=0.125 2024-08-15 07:32:58,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3072870.0, ans=0.2 2024-08-15 07:33:15,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-08-15 07:33:17,627 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 33 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 07:33:29,573 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 07:33:33,859 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 07:33:48,956 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3000, loss[loss=0.1136, beats_loss=0.01022, ecapa_loss=0.0001337, whisper_loss=0.102, over 22715.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001488, whisper_loss=0.09117, over 3875998.43 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:33:48,957 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 07:34:30,910 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005255, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 07:34:46,541 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on SV_voxceleb1: loss=0.004113, beats_loss=0, ecapa_loss=0.0004113, whisper_loss=0, over 939242.00 frames. 2024-08-15 07:36:48,663 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 07:36:48,667 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 07:36:54,050 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 07:37:11,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3073370.0, ans=0.025 2024-08-15 07:37:14,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3073370.0, ans=0.1 2024-08-15 07:37:19,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3073470.0, ans=0.125 2024-08-15 07:37:24,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-15 07:37:34,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3073570.0, ans=0.125 2024-08-15 07:37:56,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3073670.0, ans=0.2 2024-08-15 07:37:59,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.341e+01 2.586e+01 2.957e+01 4.096e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-15 07:38:02,527 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3050, loss[loss=0.08858, beats_loss=0.007395, ecapa_loss=0.0001567, whisper_loss=0.07962, over 15528.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001488, whisper_loss=0.09128, over 3899534.52 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:38:19,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-08-15 07:38:30,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3073970.0, ans=0.125 2024-08-15 07:39:00,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3074170.0, ans=0.0 2024-08-15 07:39:02,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3074170.0, ans=0.125 2024-08-15 07:39:05,359 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 9 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-15 07:39:10,766 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 07:39:11,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3074170.0, ans=0.125 2024-08-15 07:39:14,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3074270.0, ans=0.0 2024-08-15 07:39:15,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3100, loss[loss=0.0945, beats_loss=0.01101, ecapa_loss=0.0001674, whisper_loss=0.08181, over 21158.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001491, whisper_loss=0.09115, over 3892179.17 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:39:19,287 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 07:39:35,375 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 07:39:42,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3074470.0, ans=0.2 2024-08-15 07:39:50,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3074470.0, ans=0.0 2024-08-15 07:39:52,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3074470.0, ans=0.2 2024-08-15 07:39:59,483 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 07:40:00,952 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 07:40:01,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3074570.0, ans=0.0 2024-08-15 07:40:05,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3074570.0, ans=0.125 2024-08-15 07:40:07,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3074570.0, ans=0.125 2024-08-15 07:40:23,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.264e+01 2.554e+01 2.842e+01 4.812e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-15 07:40:26,734 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3150, loss[loss=0.1089, beats_loss=0.01189, ecapa_loss=0.0001407, whisper_loss=0.09557, over 23405.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001503, whisper_loss=0.09101, over 3872123.54 frames. ], batch size: 93, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:40:30,103 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 07:40:33,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3074770.0, ans=0.125 2024-08-15 07:40:48,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-15 07:40:55,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074970.0, ans=0.125 2024-08-15 07:41:14,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3075070.0, ans=0.2 2024-08-15 07:41:16,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3075070.0, ans=0.0 2024-08-15 07:41:21,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-08-15 07:41:36,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-15 07:41:39,647 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3200, loss[loss=0.1169, beats_loss=0.0102, ecapa_loss=0.0001629, whisper_loss=0.105, over 21987.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001506, whisper_loss=0.09143, over 3869415.58 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:41:49,953 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 07:41:50,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3075270.0, ans=0.1 2024-08-15 07:42:06,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3075370.0, ans=0.125 2024-08-15 07:42:10,686 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 07:42:12,070 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 07:42:16,323 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 07:42:23,836 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 07:42:24,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3075570.0, ans=10.0 2024-08-15 07:42:48,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.323e+01 2.639e+01 2.854e+01 4.930e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 07:42:51,931 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3250, loss[loss=0.09741, beats_loss=0.01135, ecapa_loss=0.0001764, whisper_loss=0.08429, over 20946.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001498, whisper_loss=0.09173, over 3886169.50 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:43:07,643 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 07:43:56,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3076170.0, ans=0.125 2024-08-15 07:44:01,715 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3300, loss[loss=0.08658, beats_loss=0.01275, ecapa_loss=0.0001312, whisper_loss=0.07251, over 17173.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001491, whisper_loss=0.09161, over 3884940.65 frames. ], batch size: 69, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:44:02,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3076270.0, ans=15.0 2024-08-15 07:44:15,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3076370.0, ans=0.125 2024-08-15 07:44:16,977 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 07:44:26,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3076370.0, ans=0.125 2024-08-15 07:44:52,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3076570.0, ans=0.125 2024-08-15 07:44:55,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3076570.0, ans=0.125 2024-08-15 07:44:57,683 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 07:45:11,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.361e+01 2.617e+01 2.908e+01 9.847e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 07:45:13,924 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3350, loss[loss=0.1372, beats_loss=0.009751, ecapa_loss=0.0001389, whisper_loss=0.126, over 17345.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001488, whisper_loss=0.09141, over 3895642.85 frames. ], batch size: 65, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:45:21,352 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 07:45:35,384 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 07:45:36,739 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 07:45:42,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3076970.0, ans=0.125 2024-08-15 07:45:51,033 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 07:45:59,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3077070.0, ans=0.2 2024-08-15 07:46:05,704 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 07:46:07,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-15 07:46:10,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3077170.0, ans=0.125 2024-08-15 07:46:18,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2024-08-15 07:46:19,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3077170.0, ans=0.0 2024-08-15 07:46:24,644 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3400, loss[loss=0.09064, beats_loss=0.01208, ecapa_loss=0.0001284, whisper_loss=0.07727, over 21645.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001484, whisper_loss=0.09164, over 3891730.89 frames. ], batch size: 86, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:46:34,587 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 07:46:37,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3077370.0, ans=0.125 2024-08-15 07:46:38,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3077370.0, ans=0.125 2024-08-15 07:46:41,668 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 07:46:41,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3077370.0, ans=0.05 2024-08-15 07:47:03,111 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 07:47:20,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3077670.0, ans=0.1 2024-08-15 07:47:32,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.325e+01 2.575e+01 2.904e+01 4.420e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 07:47:35,142 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3450, loss[loss=0.07474, beats_loss=0.01108, ecapa_loss=0.0001526, whisper_loss=0.06213, over 18900.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001499, whisper_loss=0.0905, over 3869903.72 frames. ], batch size: 77, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:47:46,078 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-15 07:47:53,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3077870.0, ans=0.0 2024-08-15 07:48:18,626 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 07:48:48,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3078270.0, ans=0.125 2024-08-15 07:48:49,227 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3500, loss[loss=0.09531, beats_loss=0.01044, ecapa_loss=0.0001936, whisper_loss=0.08293, over 21125.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.09061, over 3867640.59 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:49:05,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2024-08-15 07:49:06,704 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 07:49:07,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3078370.0, ans=0.125 2024-08-15 07:49:13,680 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 07:49:17,881 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 07:49:23,491 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 28 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 07:49:25,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3078470.0, ans=0.1 2024-08-15 07:49:37,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2024-08-15 07:49:57,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.337e+01 2.595e+01 2.865e+01 3.542e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-15 07:49:59,977 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3550, loss[loss=0.09741, beats_loss=0.01043, ecapa_loss=0.00015, whisper_loss=0.08548, over 18997.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001511, whisper_loss=0.0907, over 3872202.58 frames. ], batch size: 78, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:50:02,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-15 07:50:22,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3078870.0, ans=0.125 2024-08-15 07:50:25,354 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 07:50:37,324 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 07:51:12,188 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3600, loss[loss=0.119, beats_loss=0.01035, ecapa_loss=0.0001596, whisper_loss=0.107, over 16399.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001505, whisper_loss=0.09086, over 3880551.74 frames. ], batch size: 65, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:51:28,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3079370.0, ans=0.0 2024-08-15 07:51:35,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3079370.0, ans=0.0 2024-08-15 07:51:47,191 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 07:51:49,835 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 07:52:01,850 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 07:52:21,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.253e+01 2.466e+01 2.769e+01 4.270e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-15 07:52:21,836 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 07:52:24,722 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3650, loss[loss=0.09072, beats_loss=0.01187, ecapa_loss=0.000142, whisper_loss=0.07743, over 17598.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.09067, over 3891937.05 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:52:27,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3079770.0, ans=0.125 2024-08-15 07:52:43,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.62 vs. limit=22.5 2024-08-15 07:53:07,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3079970.0, ans=0.2 2024-08-15 07:53:12,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3080070.0, ans=0.125 2024-08-15 07:53:14,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3080070.0, ans=0.0 2024-08-15 07:53:27,739 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.295e+01 2024-08-15 07:53:38,196 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3700, loss[loss=0.1265, beats_loss=0.008157, ecapa_loss=0.0001936, whisper_loss=0.1164, over 21781.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001506, whisper_loss=0.09097, over 3878503.10 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:53:42,345 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 07:54:05,849 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 07:54:16,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-15 07:54:27,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3080570.0, ans=0.125 2024-08-15 07:54:37,844 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 07:54:38,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3080670.0, ans=0.125 2024-08-15 07:54:43,148 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-15 07:54:45,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.379e+01 2.603e+01 2.924e+01 1.234e+02, threshold=5.207e+01, percent-clipped=1.0 2024-08-15 07:54:47,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3750, loss[loss=0.1208, beats_loss=0.008802, ecapa_loss=0.0001557, whisper_loss=0.1104, over 23532.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001513, whisper_loss=0.09137, over 3863174.66 frames. ], batch size: 92, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:54:51,981 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 07:54:57,260 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 07:55:03,050 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 7 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 07:55:03,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080870.0, ans=0.0 2024-08-15 07:55:14,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3080970.0, ans=0.015 2024-08-15 07:55:18,071 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 07:55:32,002 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 07:55:50,115 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 07:55:52,930 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 07:55:56,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3800, loss[loss=0.07483, beats_loss=0.01407, ecapa_loss=0.0001064, whisper_loss=0.0597, over 23595.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001508, whisper_loss=0.09108, over 3871993.59 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:56:37,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3081570.0, ans=0.125 2024-08-15 07:56:52,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3081670.0, ans=0.0 2024-08-15 07:57:02,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.325e+01 2.607e+01 2.959e+01 1.127e+02, threshold=5.215e+01, percent-clipped=1.0 2024-08-15 07:57:05,429 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3850, loss[loss=0.1021, beats_loss=0.01024, ecapa_loss=0.0001512, whisper_loss=0.09034, over 15390.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.000152, whisper_loss=0.0905, over 3866737.90 frames. ], batch size: 60, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:57:12,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3081770.0, ans=0.0 2024-08-15 07:57:14,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3081770.0, ans=0.2 2024-08-15 07:57:15,206 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-15 07:57:18,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3081870.0, ans=0.125 2024-08-15 07:57:22,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-15 07:57:35,216 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 07:57:42,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3081970.0, ans=0.2 2024-08-15 07:57:42,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3081970.0, ans=0.1 2024-08-15 07:58:02,210 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 17 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 07:58:03,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3082170.0, ans=0.2 2024-08-15 07:58:05,200 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-15 07:58:14,038 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3900, loss[loss=0.08482, beats_loss=0.01159, ecapa_loss=0.0001759, whisper_loss=0.07147, over 14396.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001521, whisper_loss=0.09116, over 3851392.77 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:58:40,172 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.655e-03 2024-08-15 07:58:50,840 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 14 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 07:59:04,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-15 07:59:10,115 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.384e+00 2024-08-15 07:59:17,840 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 07:59:19,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.371e+01 2.557e+01 2.985e+01 4.331e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 07:59:22,091 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 3950, loss[loss=0.1086, beats_loss=0.008997, ecapa_loss=0.0001675, whisper_loss=0.09796, over 15770.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001513, whisper_loss=0.09175, over 3877949.18 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:59:24,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3082770.0, ans=12.0 2024-08-15 07:59:32,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3082770.0, ans=0.025 2024-08-15 07:59:33,483 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 07:59:43,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3082870.0, ans=0.1 2024-08-15 07:59:53,363 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 07:59:58,641 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 08:00:05,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-15 08:00:24,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2024-08-15 08:00:31,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4000, loss[loss=0.08123, beats_loss=0.01073, ecapa_loss=0.0001323, whisper_loss=0.06917, over 15320.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01056, ecapa_loss=0.0001517, whisper_loss=0.0919, over 3891827.00 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:00:56,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3083370.0, ans=10.0 2024-08-15 08:01:03,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-15 08:01:26,548 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 08:01:35,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3083670.0, ans=0.1 2024-08-15 08:01:39,010 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.375e+01 2.633e+01 2.839e+01 4.243e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-15 08:01:42,096 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4050, loss[loss=0.1155, beats_loss=0.01016, ecapa_loss=0.0001474, whisper_loss=0.1039, over 19952.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01054, ecapa_loss=0.0001517, whisper_loss=0.09216, over 3851772.84 frames. ], batch size: 77, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:01:45,250 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 08:01:49,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3083770.0, ans=0.05 2024-08-15 08:02:00,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3083870.0, ans=0.025 2024-08-15 08:02:00,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3083870.0, ans=0.07 2024-08-15 08:02:07,198 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 08:02:14,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3083970.0, ans=0.125 2024-08-15 08:02:24,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3084070.0, ans=0.0 2024-08-15 08:02:27,981 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 32 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 08:02:43,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3084170.0, ans=0.125 2024-08-15 08:02:45,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3084170.0, ans=0.0 2024-08-15 08:02:50,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3084270.0, ans=0.2 2024-08-15 08:02:50,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4100, loss[loss=0.07606, beats_loss=0.01134, ecapa_loss=0.0001572, whisper_loss=0.06315, over 13541.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01046, ecapa_loss=0.0001528, whisper_loss=0.0932, over 3889831.62 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:02:56,918 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 17 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 08:03:04,703 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 08:03:17,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3084470.0, ans=0.125 2024-08-15 08:03:25,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084470.0, ans=0.1 2024-08-15 08:03:39,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3084570.0, ans=0.125 2024-08-15 08:03:48,088 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 08:03:48,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2024-08-15 08:03:57,369 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.396e+01 2.573e+01 2.888e+01 2.852e+02, threshold=5.147e+01, percent-clipped=1.0 2024-08-15 08:03:59,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3084770.0, ans=0.09899494936611666 2024-08-15 08:04:00,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4150, loss[loss=0.09371, beats_loss=0.01262, ecapa_loss=0.0001607, whisper_loss=0.07948, over 14954.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01054, ecapa_loss=0.000153, whisper_loss=0.09263, over 3888833.57 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:04:29,740 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 08:04:32,540 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 08:04:35,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3084970.0, ans=0.1 2024-08-15 08:04:45,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085070.0, ans=0.1 2024-08-15 08:04:50,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3085070.0, ans=0.0 2024-08-15 08:04:57,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3085170.0, ans=0.1 2024-08-15 08:05:08,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3085270.0, ans=0.125 2024-08-15 08:05:09,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4200, loss[loss=0.1098, beats_loss=0.009227, ecapa_loss=0.0001559, whisper_loss=0.09899, over 21388.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001519, whisper_loss=0.0914, over 3890113.53 frames. ], batch size: 84, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:05:17,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3085270.0, ans=0.125 2024-08-15 08:05:31,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3085370.0, ans=0.125 2024-08-15 08:05:32,508 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 27 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 08:05:35,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-15 08:05:44,923 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 08:05:57,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3085570.0, ans=0.125 2024-08-15 08:06:06,688 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 23 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 08:06:12,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3085670.0, ans=0.0 2024-08-15 08:06:15,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.224e+01 2.413e+01 2.831e+01 9.655e+01, threshold=4.827e+01, percent-clipped=1.0 2024-08-15 08:06:18,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4250, loss[loss=0.1141, beats_loss=0.009128, ecapa_loss=0.0001666, whisper_loss=0.1034, over 21448.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001516, whisper_loss=0.09059, over 3901615.70 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:06:24,946 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-15 08:06:28,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3085770.0, ans=0.125 2024-08-15 08:06:29,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-15 08:06:32,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3085870.0, ans=0.125 2024-08-15 08:06:39,323 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 08:06:44,593 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 08:06:56,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3085970.0, ans=0.0 2024-08-15 08:07:21,870 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 17 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-15 08:07:25,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-15 08:07:28,335 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4300, loss[loss=0.0925, beats_loss=0.01303, ecapa_loss=0.0001486, whisper_loss=0.07798, over 21801.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001528, whisper_loss=0.09046, over 3878011.41 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:07:49,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3086370.0, ans=0.07 2024-08-15 08:08:14,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3086570.0, ans=0.035 2024-08-15 08:08:22,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3086570.0, ans=0.125 2024-08-15 08:08:29,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-15 08:08:35,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.282e+01 2.452e+01 2.695e+01 5.506e+01, threshold=4.904e+01, percent-clipped=1.0 2024-08-15 08:08:38,022 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4350, loss[loss=0.1033, beats_loss=0.01236, ecapa_loss=0.000117, whisper_loss=0.08982, over 19974.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.000152, whisper_loss=0.09045, over 3886584.01 frames. ], batch size: 79, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:09:04,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3086970.0, ans=0.0 2024-08-15 08:09:08,989 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 08:09:16,742 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 08:09:26,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3087070.0, ans=0.035 2024-08-15 08:09:27,826 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 08:09:29,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087070.0, ans=0.1 2024-08-15 08:09:44,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3087170.0, ans=0.025 2024-08-15 08:09:46,897 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4400, loss[loss=0.08993, beats_loss=0.01293, ecapa_loss=0.0001348, whisper_loss=0.07565, over 22740.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.0001507, whisper_loss=0.09152, over 3900328.44 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:10:14,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3087470.0, ans=0.035 2024-08-15 08:10:17,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3087470.0, ans=0.125 2024-08-15 08:10:26,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087570.0, ans=0.1 2024-08-15 08:10:37,734 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 08:10:52,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.331e+01 2.564e+01 2.900e+01 4.263e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:10:55,220 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4450, loss[loss=0.09646, beats_loss=0.01182, ecapa_loss=0.0001617, whisper_loss=0.08303, over 21371.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001499, whisper_loss=0.09113, over 3927406.78 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:11:07,922 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 08:11:08,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-15 08:11:09,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3087870.0, ans=0.0 2024-08-15 08:11:12,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3087870.0, ans=0.125 2024-08-15 08:11:22,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3087970.0, ans=0.125 2024-08-15 08:11:23,555 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 08:11:26,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3087970.0, ans=0.125 2024-08-15 08:11:31,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3087970.0, ans=0.0 2024-08-15 08:11:34,949 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-15 08:11:40,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3088070.0, ans=0.2 2024-08-15 08:11:41,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3088070.0, ans=0.125 2024-08-15 08:11:46,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3088070.0, ans=0.125 2024-08-15 08:12:03,955 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 08:12:05,039 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4500, loss[loss=0.1059, beats_loss=0.01046, ecapa_loss=0.000175, whisper_loss=0.09374, over 22288.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001506, whisper_loss=0.09077, over 3876101.62 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:12:10,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-15 08:12:13,456 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 08:12:24,665 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 08:12:29,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3088370.0, ans=0.125 2024-08-15 08:12:39,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3088470.0, ans=0.2 2024-08-15 08:12:46,896 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 08:12:57,777 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 08:13:04,745 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 08:13:11,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.272e+01 2.536e+01 2.739e+01 4.209e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 08:13:11,393 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 08:13:13,971 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4550, loss[loss=0.09329, beats_loss=0.0142, ecapa_loss=0.000118, whisper_loss=0.07791, over 17725.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001517, whisper_loss=0.09035, over 3881389.90 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:13:30,576 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-15 08:13:35,877 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 08:13:36,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3088870.0, ans=0.125 2024-08-15 08:13:44,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-15 08:13:56,553 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 08:13:58,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3089070.0, ans=0.2 2024-08-15 08:14:09,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3089170.0, ans=0.0 2024-08-15 08:14:20,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3089170.0, ans=0.0 2024-08-15 08:14:23,245 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4600, loss[loss=0.09005, beats_loss=0.01082, ecapa_loss=0.0001223, whisper_loss=0.07801, over 23293.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001508, whisper_loss=0.09065, over 3886767.36 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:14:24,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-08-15 08:14:24,321 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=12.0 2024-08-15 08:14:25,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-15 08:14:26,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3089270.0, ans=0.0 2024-08-15 08:14:28,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3089270.0, ans=0.125 2024-08-15 08:14:53,903 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 08:15:28,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.27 vs. limit=10.0 2024-08-15 08:15:29,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3089670.0, ans=0.07 2024-08-15 08:15:34,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.352e+01 2.617e+01 2.995e+01 7.008e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 08:15:37,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4650, loss[loss=0.09923, beats_loss=0.01029, ecapa_loss=0.0001609, whisper_loss=0.08733, over 22354.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001517, whisper_loss=0.09053, over 3898828.56 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:15:37,939 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 08:15:40,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3089770.0, ans=0.0 2024-08-15 08:15:47,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3089770.0, ans=0.1 2024-08-15 08:15:56,038 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 08:15:57,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3089870.0, ans=0.0 2024-08-15 08:16:01,954 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 08:16:08,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=3089970.0, ans=10.0 2024-08-15 08:16:17,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3089970.0, ans=0.0 2024-08-15 08:16:27,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3090070.0, ans=12.0 2024-08-15 08:16:42,739 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 08:16:45,862 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 08:16:48,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-15 08:16:50,099 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 08:16:54,906 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4700, loss[loss=0.09969, beats_loss=0.01138, ecapa_loss=0.0001312, whisper_loss=0.087, over 23309.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.09068, over 3899148.25 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:17:00,897 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 08:17:04,409 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-15 08:17:22,276 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 08:17:26,267 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 08:17:40,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3090470.0, ans=0.0 2024-08-15 08:18:12,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.597e+01 3.087e+01 7.330e+01, threshold=5.194e+01, percent-clipped=1.0 2024-08-15 08:18:14,092 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 17 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 08:18:15,785 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4750, loss[loss=0.08282, beats_loss=0.0114, ecapa_loss=0.0001726, whisper_loss=0.06969, over 18246.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.000152, whisper_loss=0.08981, over 3859373.99 frames. ], batch size: 75, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:18:28,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3090770.0, ans=0.0 2024-08-15 08:18:30,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-08-15 08:18:35,893 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 08:19:07,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3091070.0, ans=0.2 2024-08-15 08:19:08,826 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 08:19:09,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3091070.0, ans=0.2 2024-08-15 08:19:13,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3091070.0, ans=0.125 2024-08-15 08:19:32,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4800, loss[loss=0.1172, beats_loss=0.008322, ecapa_loss=0.0001796, whisper_loss=0.1071, over 18303.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01067, ecapa_loss=0.000152, whisper_loss=0.08951, over 3846800.25 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:19:36,088 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 08:19:37,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3091270.0, ans=0.0 2024-08-15 08:19:42,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3091270.0, ans=0.125 2024-08-15 08:19:44,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3091270.0, ans=0.5 2024-08-15 08:20:00,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-15 08:20:08,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3091470.0, ans=0.125 2024-08-15 08:20:18,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-08-15 08:20:50,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.389e+01 2.640e+01 2.976e+01 3.395e+02, threshold=5.281e+01, percent-clipped=5.0 2024-08-15 08:20:52,433 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4850, loss[loss=0.08732, beats_loss=0.01233, ecapa_loss=0.0001521, whisper_loss=0.07347, over 21587.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.0001516, whisper_loss=0.08956, over 3890316.77 frames. ], batch size: 93, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:21:10,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3091870.0, ans=0.125 2024-08-15 08:21:23,751 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 08:21:29,592 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-15 08:21:38,099 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:21:48,361 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 08:22:03,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3092170.0, ans=0.0 2024-08-15 08:22:07,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=12.0 2024-08-15 08:22:12,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4900, loss[loss=0.08882, beats_loss=0.01253, ecapa_loss=0.0001312, whisper_loss=0.07498, over 21476.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.09031, over 3868618.85 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:22:30,105 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 6 from Vox, 37 fro AS 2024-08-15 08:23:17,983 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 08:23:30,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.221e+01 2.391e+01 2.667e+01 3.893e+01, threshold=4.783e+01, percent-clipped=0.0 2024-08-15 08:23:31,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3092770.0, ans=0.125 2024-08-15 08:23:32,564 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 4950, loss[loss=0.08947, beats_loss=0.01187, ecapa_loss=0.0001272, whisper_loss=0.07632, over 18879.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001512, whisper_loss=0.09018, over 3859085.35 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:24:12,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3092970.0, ans=0.125 2024-08-15 08:24:30,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3093070.0, ans=0.0 2024-08-15 08:24:45,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3093170.0, ans=0.2 2024-08-15 08:24:47,352 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:24:48,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5000, loss[loss=0.1225, beats_loss=0.009006, ecapa_loss=0.0001587, whisper_loss=0.1119, over 18096.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001518, whisper_loss=0.09083, over 3885550.73 frames. ], batch size: 68, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:24:51,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3093270.0, ans=0.125 2024-08-15 08:24:51,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3093270.0, ans=0.0 2024-08-15 08:24:53,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3093270.0, ans=0.125 2024-08-15 08:24:55,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3093270.0, ans=0.125 2024-08-15 08:24:55,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-08-15 08:25:02,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3093270.0, ans=0.125 2024-08-15 08:25:08,491 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 08:25:11,308 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 08:25:16,046 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 08:25:19,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3093470.0, ans=0.125 2024-08-15 08:25:26,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3093470.0, ans=0.125 2024-08-15 08:25:28,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-15 08:25:32,869 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-15 08:25:34,362 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 08:25:38,675 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-15 08:25:39,748 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-08-15 08:25:51,025 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-15 08:26:03,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3093670.0, ans=0.1 2024-08-15 08:26:07,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.349e+01 2.649e+01 2.919e+01 1.466e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-15 08:26:09,249 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5050, loss[loss=0.1072, beats_loss=0.01151, ecapa_loss=0.0001714, whisper_loss=0.09397, over 17315.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001507, whisper_loss=0.09061, over 3903714.71 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:26:14,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3093770.0, ans=0.1 2024-08-15 08:26:23,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-15 08:26:38,170 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 08:26:44,049 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 08:26:44,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3093970.0, ans=0.125 2024-08-15 08:26:48,801 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 08:26:58,865 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 08:27:11,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3094070.0, ans=0.2 2024-08-15 08:27:23,886 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 08:27:24,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-15 08:27:30,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5100, loss[loss=0.1151, beats_loss=0.008684, ecapa_loss=0.0001413, whisper_loss=0.105, over 21436.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001494, whisper_loss=0.09029, over 3922625.68 frames. ], batch size: 83, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:27:30,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3094270.0, ans=0.125 2024-08-15 08:27:34,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3094270.0, ans=0.1 2024-08-15 08:27:42,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-15 08:27:56,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094370.0, ans=0.1 2024-08-15 08:28:20,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3094570.0, ans=0.0 2024-08-15 08:28:24,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=3094570.0, ans=6.0 2024-08-15 08:28:25,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3094570.0, ans=0.125 2024-08-15 08:28:28,228 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 08:28:46,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2024-08-15 08:28:47,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3094670.0, ans=0.0 2024-08-15 08:28:50,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.353e+01 2.564e+01 3.013e+01 4.662e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:28:51,400 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5150, loss[loss=0.09346, beats_loss=0.01084, ecapa_loss=0.0001278, whisper_loss=0.08134, over 14570.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.000148, whisper_loss=0.09114, over 3930138.49 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:28:52,253 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 08:29:17,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-08-15 08:29:21,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3094870.0, ans=0.125 2024-08-15 08:29:29,424 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 08:29:45,473 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 16 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 08:29:50,362 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 08:29:53,855 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 08:30:12,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5200, loss[loss=0.1023, beats_loss=0.0119, ecapa_loss=0.0001864, whisper_loss=0.08853, over 21976.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001483, whisper_loss=0.09095, over 3949980.94 frames. ], batch size: 93, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:30:13,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095270.0, ans=0.1 2024-08-15 08:30:17,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3095270.0, ans=0.125 2024-08-15 08:30:21,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3095270.0, ans=0.2 2024-08-15 08:30:29,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-08-15 08:30:36,869 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 08:30:53,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3095470.0, ans=10.0 2024-08-15 08:31:31,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.296e+01 2.558e+01 2.889e+01 4.447e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-15 08:31:32,979 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5250, loss[loss=0.1284, beats_loss=0.009629, ecapa_loss=0.0001386, whisper_loss=0.1173, over 17706.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001487, whisper_loss=0.09107, over 3950822.06 frames. ], batch size: 66, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:31:41,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095770.0, ans=0.1 2024-08-15 08:31:55,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3095870.0, ans=0.125 2024-08-15 08:31:58,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-15 08:32:19,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3096070.0, ans=0.1 2024-08-15 08:32:22,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3096070.0, ans=0.0 2024-08-15 08:32:26,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-15 08:32:38,425 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 08:32:39,688 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 33 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 08:32:51,555 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5300, loss[loss=0.09641, beats_loss=0.01159, ecapa_loss=0.0001039, whisper_loss=0.08378, over 23006.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.000149, whisper_loss=0.09165, over 3933386.65 frames. ], batch size: 87, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:32:55,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096270.0, ans=0.1 2024-08-15 08:33:08,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096370.0, ans=0.1 2024-08-15 08:33:09,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3096370.0, ans=0.0 2024-08-15 08:33:30,298 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2024-08-15 08:33:34,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3096470.0, ans=0.05 2024-08-15 08:33:35,655 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 08:33:47,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3096570.0, ans=0.125 2024-08-15 08:33:50,231 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 08:33:53,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3096570.0, ans=0.125 2024-08-15 08:34:11,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.309e+01 2.587e+01 2.813e+01 1.007e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 08:34:12,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5350, loss[loss=0.08008, beats_loss=0.01327, ecapa_loss=0.0001172, whisper_loss=0.06563, over 17629.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001487, whisper_loss=0.0913, over 3930272.31 frames. ], batch size: 71, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:34:15,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3096770.0, ans=0.125 2024-08-15 08:34:37,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-08-15 08:34:46,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3096970.0, ans=0.0 2024-08-15 08:34:53,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3096970.0, ans=0.0 2024-08-15 08:34:53,917 WARNING [optim.py:496] (2/4) Scaling gradients by 0.02987569198012352, model_norm_threshold=51.74189376831055 2024-08-15 08:34:54,084 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.299e+06, grad_sumsq=1.297e+08, orig_rms_sq=1.001e-02 2024-08-15 08:34:57,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3096970.0, ans=0.125 2024-08-15 08:35:10,318 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 08:35:30,184 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 08:35:34,278 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5400, loss[loss=0.09007, beats_loss=0.01056, ecapa_loss=0.0001322, whisper_loss=0.07819, over 17802.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001484, whisper_loss=0.0912, over 3934842.48 frames. ], batch size: 69, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:35:35,899 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 08:35:43,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3097270.0, ans=0.025 2024-08-15 08:35:46,277 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 08:35:46,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3097270.0, ans=0.07 2024-08-15 08:36:25,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3097470.0, ans=0.125 2024-08-15 08:36:44,859 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07409150153398514, model_norm_threshold=51.74189376831055 2024-08-15 08:36:45,023 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.947e+04, grad_sumsq=3.947e+04, orig_rms_sq=1.000e+00 2024-08-15 08:36:56,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3097670.0, ans=0.125 2024-08-15 08:36:59,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.348e+01 2.686e+01 2.991e+01 1.732e+03, threshold=5.372e+01, percent-clipped=3.0 2024-08-15 08:37:00,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5450, loss[loss=0.08714, beats_loss=0.01093, ecapa_loss=0.0001237, whisper_loss=0.07497, over 19722.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001485, whisper_loss=0.09098, over 3892560.53 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:37:12,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3097770.0, ans=0.125 2024-08-15 08:37:19,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3097870.0, ans=0.125 2024-08-15 08:37:23,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3097870.0, ans=0.0 2024-08-15 08:37:31,374 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 08:37:48,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3097970.0, ans=0.0 2024-08-15 08:37:51,053 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 08:38:41,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3098270.0, ans=0.125 2024-08-15 08:38:43,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5500, loss[loss=0.1257, beats_loss=0.009052, ecapa_loss=0.000174, whisper_loss=0.1149, over 20878.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001481, whisper_loss=0.09075, over 3875866.64 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:38:52,131 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 08:39:04,235 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 08:39:06,830 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 34 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 08:39:18,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3098370.0, ans=0.1 2024-08-15 08:39:19,584 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:39:26,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3098470.0, ans=0.0 2024-08-15 08:39:30,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3098470.0, ans=0.0 2024-08-15 08:39:31,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3098470.0, ans=0.1 2024-08-15 08:40:20,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3098670.0, ans=0.0 2024-08-15 08:40:24,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.188e+01 2.412e+01 2.681e+01 4.153e+01, threshold=4.824e+01, percent-clipped=0.0 2024-08-15 08:40:28,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5550, loss[loss=0.09433, beats_loss=0.01049, ecapa_loss=0.0001654, whisper_loss=0.08219, over 21097.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001488, whisper_loss=0.09062, over 3890119.51 frames. ], batch size: 84, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:40:36,578 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 08:41:11,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3098870.0, ans=0.1 2024-08-15 08:41:17,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-08-15 08:41:19,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2024-08-15 08:41:31,615 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 08:41:36,299 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 08:41:46,273 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 08:41:55,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-15 08:41:57,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3099070.0, ans=0.125 2024-08-15 08:42:04,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3099170.0, ans=0.125 2024-08-15 08:42:04,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3099170.0, ans=0.125 2024-08-15 08:42:19,697 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 08:42:21,955 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 08:42:29,203 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5600, loss[loss=0.07132, beats_loss=0.01221, ecapa_loss=0.0001299, whisper_loss=0.05781, over 19226.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01076, ecapa_loss=0.000149, whisper_loss=0.08952, over 3848550.90 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:42:31,315 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 08:42:46,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2024-08-15 08:43:08,742 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 08:43:12,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3099370.0, ans=0.2 2024-08-15 08:43:28,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3099470.0, ans=0.5 2024-08-15 08:43:39,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3099470.0, ans=0.1 2024-08-15 08:43:56,019 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 08:44:05,538 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 08:44:35,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.363e+01 2.562e+01 2.966e+01 4.681e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 08:44:36,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5650, loss[loss=0.1077, beats_loss=0.009099, ecapa_loss=0.000174, whisper_loss=0.09685, over 22377.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01079, ecapa_loss=0.0001505, whisper_loss=0.08859, over 3875531.62 frames. ], batch size: 93, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:44:47,931 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 08:44:59,832 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 08:45:05,551 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 08:45:23,260 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 08:46:16,996 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 08:46:18,490 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 08:46:23,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5700, loss[loss=0.1043, beats_loss=0.01001, ecapa_loss=0.0001238, whisper_loss=0.0931, over 14857.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01073, ecapa_loss=0.0001517, whisper_loss=0.08917, over 3883344.02 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:46:34,114 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 08:47:00,355 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 08:47:32,326 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 08:47:38,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3100670.0, ans=0.125 2024-08-15 08:47:42,132 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 08:47:43,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.415e+01 2.624e+01 2.975e+01 2.275e+02, threshold=5.249e+01, percent-clipped=3.0 2024-08-15 08:47:44,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5750, loss[loss=0.1002, beats_loss=0.01147, ecapa_loss=0.0002036, whisper_loss=0.08672, over 17365.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001525, whisper_loss=0.08961, over 3865574.71 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:47:49,590 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 08:47:54,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3100770.0, ans=0.125 2024-08-15 08:47:56,944 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 08:48:13,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3100870.0, ans=0.125 2024-08-15 08:48:27,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3100970.0, ans=0.125 2024-08-15 08:48:32,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3101070.0, ans=0.2 2024-08-15 08:48:34,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3101070.0, ans=0.2 2024-08-15 08:49:05,627 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5800, loss[loss=0.0974, beats_loss=0.006983, ecapa_loss=0.0002031, whisper_loss=0.08839, over 17455.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001535, whisper_loss=0.09057, over 3859746.87 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:49:32,800 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 08:49:34,564 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 08:49:35,940 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 08:49:44,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3101470.0, ans=0.125 2024-08-15 08:49:47,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3101470.0, ans=0.125 2024-08-15 08:50:04,757 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 08:50:18,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-15 08:50:23,932 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.392e+01 2.742e+01 3.107e+01 2.079e+02, threshold=5.485e+01, percent-clipped=4.0 2024-08-15 08:50:25,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5850, loss[loss=0.09119, beats_loss=0.01249, ecapa_loss=0.0001644, whisper_loss=0.07705, over 18315.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.000153, whisper_loss=0.0896, over 3881659.86 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:50:28,945 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 17 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 08:50:45,282 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 08:51:00,530 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.283e+01 2024-08-15 08:51:10,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3101970.0, ans=0.0 2024-08-15 08:51:21,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-15 08:51:33,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3102170.0, ans=0.125 2024-08-15 08:51:36,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-15 08:51:44,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5900, loss[loss=0.1155, beats_loss=0.008723, ecapa_loss=0.0001872, whisper_loss=0.1049, over 21680.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01068, ecapa_loss=0.0001521, whisper_loss=0.08935, over 3900804.50 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:51:48,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3102270.0, ans=0.1 2024-08-15 08:51:56,394 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 08:52:01,028 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 30 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 08:52:01,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3102370.0, ans=0.0 2024-08-15 08:52:09,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=12.0 2024-08-15 08:52:12,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102370.0, ans=0.1 2024-08-15 08:52:27,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3102470.0, ans=0.0 2024-08-15 08:52:29,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3102470.0, ans=10.0 2024-08-15 08:52:29,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3102470.0, ans=0.125 2024-08-15 08:52:30,251 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 35 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 08:52:36,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3102570.0, ans=0.0 2024-08-15 08:52:43,476 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 08:52:51,609 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 18 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-15 08:52:56,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.04 vs. limit=22.5 2024-08-15 08:52:57,697 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-15 08:52:59,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.266e+01 2.479e+01 2.808e+01 3.444e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-15 08:53:01,407 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 5950, loss[loss=0.1311, beats_loss=0.00916, ecapa_loss=0.0001634, whisper_loss=0.1203, over 23198.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001519, whisper_loss=0.08986, over 3904448.61 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:53:13,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.04 vs. limit=10.0 2024-08-15 08:53:18,155 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 08:53:32,075 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 08:53:35,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.99 vs. limit=22.5 2024-08-15 08:53:36,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3102970.0, ans=0.125 2024-08-15 08:53:43,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3102970.0, ans=0.0 2024-08-15 08:54:18,680 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6000, loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001446, whisper_loss=0.09015, over 17723.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01078, ecapa_loss=0.0001519, whisper_loss=0.08917, over 3908083.14 frames. ], batch size: 69, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:54:18,681 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 08:54:59,165 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005326, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 08:55:14,758 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on SV_voxceleb1: loss=0.004204, beats_loss=0, ecapa_loss=0.0004204, whisper_loss=0, over 939242.00 frames. 2024-08-15 08:57:14,017 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 08:57:14,022 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 08:57:20,173 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 08:57:23,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103270.0, ans=0.1 2024-08-15 08:57:43,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3103470.0, ans=0.0 2024-08-15 08:57:59,605 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 08:58:17,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=8.0 2024-08-15 08:58:28,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.350e+01 2.585e+01 2.886e+01 6.077e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-15 08:58:30,781 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6050, loss[loss=0.1043, beats_loss=0.00958, ecapa_loss=0.0001664, whisper_loss=0.09301, over 21128.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001524, whisper_loss=0.09038, over 3914304.99 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:58:35,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3103770.0, ans=0.125 2024-08-15 08:58:40,768 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=22.5 2024-08-15 08:58:55,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2024-08-15 08:58:58,152 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 08:59:14,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3104070.0, ans=0.125 2024-08-15 08:59:24,728 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 08:59:31,043 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 08:59:31,314 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:59:36,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104170.0, ans=0.1 2024-08-15 08:59:39,932 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 08:59:45,340 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6100, loss[loss=0.06989, beats_loss=0.01395, ecapa_loss=0.00018, whisper_loss=0.05414, over 15213.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001524, whisper_loss=0.09044, over 3928707.87 frames. ], batch size: 64, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:59:59,115 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 09:00:32,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3104570.0, ans=0.125 2024-08-15 09:00:57,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.233e+01 2.517e+01 2.744e+01 4.126e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-15 09:00:58,740 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6150, loss[loss=0.09674, beats_loss=0.00999, ecapa_loss=0.0001679, whisper_loss=0.08507, over 20643.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001515, whisper_loss=0.09032, over 3940225.45 frames. ], batch size: 84, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:01:01,711 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 09:01:02,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3104770.0, ans=0.125 2024-08-15 09:01:02,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3104770.0, ans=0.125 2024-08-15 09:01:03,900 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 09:01:12,448 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.59 vs. limit=10.0 2024-08-15 09:01:43,948 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 09:01:44,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3105070.0, ans=0.2 2024-08-15 09:01:47,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3105070.0, ans=0.0 2024-08-15 09:01:57,222 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 17 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 09:02:03,169 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 09:02:03,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3105170.0, ans=0.0 2024-08-15 09:02:13,047 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6200, loss[loss=0.1054, beats_loss=0.01054, ecapa_loss=0.0001711, whisper_loss=0.09314, over 22612.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001518, whisper_loss=0.09053, over 3928132.25 frames. ], batch size: 93, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:02:19,224 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-15 09:02:30,517 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 09:02:31,979 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 09:02:33,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3105370.0, ans=0.025 2024-08-15 09:02:38,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3105370.0, ans=0.125 2024-08-15 09:03:21,736 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 09:03:22,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3105670.0, ans=0.0 2024-08-15 09:03:26,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.264e+01 2.437e+01 2.763e+01 4.898e+01, threshold=4.875e+01, percent-clipped=0.0 2024-08-15 09:03:28,389 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6250, loss[loss=0.0924, beats_loss=0.009963, ecapa_loss=0.0001453, whisper_loss=0.08098, over 20033.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001525, whisper_loss=0.09031, over 3902957.96 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:03:35,932 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 09:04:23,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3106070.0, ans=0.125 2024-08-15 09:04:32,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3106170.0, ans=0.0 2024-08-15 09:04:43,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3106170.0, ans=0.125 2024-08-15 09:04:45,423 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6300, loss[loss=0.07758, beats_loss=0.01136, ecapa_loss=0.0001678, whisper_loss=0.06454, over 15390.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.000152, whisper_loss=0.09011, over 3921867.52 frames. ], batch size: 60, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:04:50,574 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 09:05:04,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3106370.0, ans=0.1 2024-08-15 09:05:21,961 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 09:05:53,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=22.5 2024-08-15 09:05:53,778 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 09:06:01,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3106670.0, ans=0.125 2024-08-15 09:06:07,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3106670.0, ans=0.2 2024-08-15 09:06:07,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.627e+01 2.982e+01 5.649e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-15 09:06:09,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6350, loss[loss=0.1016, beats_loss=0.01014, ecapa_loss=0.000181, whisper_loss=0.08965, over 15376.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001519, whisper_loss=0.09062, over 3917420.97 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:06:19,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=3106770.0, ans=15.0 2024-08-15 09:06:21,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3106770.0, ans=0.0 2024-08-15 09:06:45,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3106970.0, ans=0.125 2024-08-15 09:06:47,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3106970.0, ans=0.1 2024-08-15 09:07:04,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3107070.0, ans=0.0 2024-08-15 09:07:29,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3107270.0, ans=0.125 2024-08-15 09:07:30,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6400, loss[loss=0.1076, beats_loss=0.01166, ecapa_loss=0.0001334, whisper_loss=0.09463, over 23867.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001512, whisper_loss=0.09093, over 3904256.85 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:07:34,919 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:08:21,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3107570.0, ans=0.0 2024-08-15 09:08:23,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-15 09:08:25,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3107570.0, ans=0.125 2024-08-15 09:08:39,241 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 09:08:52,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.320e+01 2.533e+01 2.838e+01 5.335e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-15 09:08:54,039 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6450, loss[loss=0.1148, beats_loss=0.01056, ecapa_loss=0.0001521, whisper_loss=0.1027, over 22635.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001501, whisper_loss=0.09084, over 3896466.43 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:09:08,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3107770.0, ans=0.125 2024-08-15 09:09:16,098 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 09:09:30,826 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 09:09:34,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2024-08-15 09:09:36,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3107970.0, ans=0.0 2024-08-15 09:09:51,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3108070.0, ans=0.0 2024-08-15 09:09:57,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3108070.0, ans=0.1 2024-08-15 09:10:03,667 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 09:10:11,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3108170.0, ans=0.125 2024-08-15 09:10:16,123 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6500, loss[loss=0.11, beats_loss=0.01036, ecapa_loss=0.0001727, whisper_loss=0.09794, over 15021.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001524, whisper_loss=0.09129, over 3916861.46 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:10:20,943 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 09:10:40,124 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 09:10:42,927 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 09:11:03,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-15 09:11:07,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3108570.0, ans=0.125 2024-08-15 09:11:12,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3108570.0, ans=0.125 2024-08-15 09:11:33,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.377e+01 2.603e+01 2.970e+01 3.973e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 09:11:33,320 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 09:11:34,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6550, loss[loss=0.1182, beats_loss=0.009602, ecapa_loss=0.0001418, whisper_loss=0.1072, over 22852.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001525, whisper_loss=0.09131, over 3942347.54 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:11:37,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-15 09:11:59,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3108870.0, ans=0.2 2024-08-15 09:12:23,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3109070.0, ans=0.125 2024-08-15 09:12:30,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-15 09:12:46,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-15 09:12:53,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6600, loss[loss=0.1181, beats_loss=0.008609, ecapa_loss=0.0001618, whisper_loss=0.1079, over 22402.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001526, whisper_loss=0.09173, over 3959472.14 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:12:59,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109270.0, ans=0.1 2024-08-15 09:13:12,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3109370.0, ans=0.0 2024-08-15 09:13:19,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3109370.0, ans=0.0 2024-08-15 09:13:22,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3109370.0, ans=0.0 2024-08-15 09:13:22,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109370.0, ans=0.1 2024-08-15 09:13:35,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-15 09:13:40,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-15 09:14:10,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.330e+01 2.492e+01 2.798e+01 4.030e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 09:14:11,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3109770.0, ans=0.125 2024-08-15 09:14:11,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6650, loss[loss=0.08582, beats_loss=0.01189, ecapa_loss=0.0001673, whisper_loss=0.07226, over 21231.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.0001523, whisper_loss=0.09205, over 3936969.99 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:14:25,247 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 09:14:26,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3109770.0, ans=0.035 2024-08-15 09:14:26,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3109770.0, ans=0.125 2024-08-15 09:14:34,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3109870.0, ans=0.0 2024-08-15 09:14:35,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3109870.0, ans=0.0 2024-08-15 09:14:37,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3109870.0, ans=0.0 2024-08-15 09:14:38,908 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 09:15:03,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2024-08-15 09:15:13,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3110070.0, ans=0.125 2024-08-15 09:15:31,314 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6700, loss[loss=0.07867, beats_loss=0.01192, ecapa_loss=0.0001364, whisper_loss=0.06539, over 14620.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001517, whisper_loss=0.09156, over 3921914.33 frames. ], batch size: 60, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:15:39,460 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 09:16:21,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3110570.0, ans=0.1 2024-08-15 09:16:29,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3110570.0, ans=0.1 2024-08-15 09:16:32,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3110570.0, ans=0.125 2024-08-15 09:16:55,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.348e+01 2.579e+01 2.866e+01 4.401e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 09:16:56,750 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6750, loss[loss=0.08653, beats_loss=0.01221, ecapa_loss=0.0001569, whisper_loss=0.07275, over 16667.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001517, whisper_loss=0.09092, over 3912415.46 frames. ], batch size: 67, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:16:59,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2024-08-15 09:17:12,455 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-15 09:17:30,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3110870.0, ans=0.0 2024-08-15 09:17:44,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3110970.0, ans=0.125 2024-08-15 09:17:52,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3111070.0, ans=0.125 2024-08-15 09:17:57,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3111070.0, ans=0.125 2024-08-15 09:18:05,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-15 09:18:11,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111170.0, ans=0.1 2024-08-15 09:18:17,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3111170.0, ans=0.125 2024-08-15 09:18:17,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3111170.0, ans=0.125 2024-08-15 09:18:21,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6800, loss[loss=0.1064, beats_loss=0.01047, ecapa_loss=0.0001636, whisper_loss=0.09426, over 21657.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001521, whisper_loss=0.0903, over 3908392.11 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:18:24,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3111270.0, ans=0.1 2024-08-15 09:18:27,711 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-15 09:18:45,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-15 09:18:48,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-08-15 09:18:49,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3111370.0, ans=0.125 2024-08-15 09:19:03,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3111470.0, ans=0.125 2024-08-15 09:19:14,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3111570.0, ans=0.025 2024-08-15 09:19:32,151 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 09:19:41,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.366e+01 2.737e+01 3.020e+01 4.133e+01, threshold=5.473e+01, percent-clipped=0.0 2024-08-15 09:19:43,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6850, loss[loss=0.08586, beats_loss=0.0111, ecapa_loss=0.0001482, whisper_loss=0.07328, over 16506.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001512, whisper_loss=0.09046, over 3894656.01 frames. ], batch size: 65, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:19:48,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-15 09:19:52,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3111770.0, ans=0.1 2024-08-15 09:19:54,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-08-15 09:20:00,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3111870.0, ans=0.125 2024-08-15 09:20:14,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3111970.0, ans=0.05 2024-08-15 09:20:26,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3111970.0, ans=0.2 2024-08-15 09:20:27,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3111970.0, ans=0.125 2024-08-15 09:20:32,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112070.0, ans=0.1 2024-08-15 09:20:34,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112070.0, ans=0.1 2024-08-15 09:20:59,438 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 09:21:00,718 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 09:21:01,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3112170.0, ans=0.125 2024-08-15 09:21:05,646 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6900, loss[loss=0.09807, beats_loss=0.0122, ecapa_loss=0.0001204, whisper_loss=0.08467, over 19616.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001519, whisper_loss=0.09056, over 3905206.53 frames. ], batch size: 77, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:21:31,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3112370.0, ans=0.07 2024-08-15 09:21:37,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3112470.0, ans=0.0 2024-08-15 09:21:47,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-15 09:21:51,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-15 09:21:57,605 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 09:22:00,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3112570.0, ans=0.125 2024-08-15 09:22:08,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=10.0 2024-08-15 09:22:18,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3112670.0, ans=0.125 2024-08-15 09:22:25,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3112670.0, ans=0.125 2024-08-15 09:22:25,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.330e+01 2.607e+01 2.906e+01 3.903e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-15 09:22:27,879 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 6950, loss[loss=0.1186, beats_loss=0.007109, ecapa_loss=0.0002126, whisper_loss=0.1094, over 17332.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001515, whisper_loss=0.09064, over 3890041.84 frames. ], batch size: 70, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:22:52,448 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:22:58,007 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 09:22:58,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3112870.0, ans=0.125 2024-08-15 09:23:04,798 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 09:23:26,964 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 09:23:35,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3113170.0, ans=0.125 2024-08-15 09:23:36,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-15 09:23:40,721 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:23:44,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3113170.0, ans=0.1 2024-08-15 09:23:46,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3113170.0, ans=0.1 2024-08-15 09:23:49,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3113170.0, ans=0.0 2024-08-15 09:23:53,476 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7000, loss[loss=0.1023, beats_loss=0.01414, ecapa_loss=0.0001304, whisper_loss=0.08681, over 22947.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001529, whisper_loss=0.09098, over 3878171.08 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:24:10,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3113370.0, ans=0.0 2024-08-15 09:24:14,697 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 09:24:25,332 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 09:24:48,199 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 09:24:55,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3113570.0, ans=0.125 2024-08-15 09:24:58,205 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 22 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 09:25:11,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.285e+01 2.515e+01 2.817e+01 4.322e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-15 09:25:13,437 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7050, loss[loss=0.1417, beats_loss=0.008037, ecapa_loss=0.0001416, whisper_loss=0.1322, over 18370.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001523, whisper_loss=0.09123, over 3886944.88 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:25:23,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-15 09:25:25,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3113770.0, ans=0.0 2024-08-15 09:25:37,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3113870.0, ans=0.125 2024-08-15 09:25:58,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3113970.0, ans=0.2 2024-08-15 09:26:02,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3114070.0, ans=0.125 2024-08-15 09:26:07,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3114070.0, ans=0.125 2024-08-15 09:26:15,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3114070.0, ans=0.0 2024-08-15 09:26:19,314 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 09:26:27,179 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-15 09:26:37,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7100, loss[loss=0.1011, beats_loss=0.009537, ecapa_loss=0.0001507, whisper_loss=0.09002, over 14468.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001516, whisper_loss=0.09066, over 3874849.66 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:26:59,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3114370.0, ans=0.2 2024-08-15 09:27:08,918 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 09:27:28,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3114570.0, ans=22.5 2024-08-15 09:27:38,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3114670.0, ans=0.2 2024-08-15 09:27:47,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3114670.0, ans=0.125 2024-08-15 09:27:49,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3114670.0, ans=0.125 2024-08-15 09:27:53,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.259e+01 2.510e+01 2.858e+01 3.355e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-15 09:27:55,414 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7150, loss[loss=0.0893, beats_loss=0.01286, ecapa_loss=0.000148, whisper_loss=0.07497, over 21854.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001503, whisper_loss=0.09055, over 3867904.12 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:27:59,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3114770.0, ans=0.1 2024-08-15 09:28:04,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3114770.0, ans=0.125 2024-08-15 09:28:05,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114770.0, ans=0.1 2024-08-15 09:28:05,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3114770.0, ans=0.125 2024-08-15 09:28:12,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-15 09:28:15,197 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-15 09:28:24,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=15.0 2024-08-15 09:28:26,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3114970.0, ans=0.125 2024-08-15 09:28:39,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3114970.0, ans=0.125 2024-08-15 09:28:56,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3115070.0, ans=0.125 2024-08-15 09:29:10,008 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 37 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 09:29:18,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7200, loss[loss=0.1004, beats_loss=0.01081, ecapa_loss=0.0001513, whisper_loss=0.08806, over 20427.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001501, whisper_loss=0.09056, over 3881101.00 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:29:31,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3115270.0, ans=0.125 2024-08-15 09:29:48,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3115370.0, ans=0.0 2024-08-15 09:29:50,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2024-08-15 09:30:14,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=12.0 2024-08-15 09:30:19,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:22,182 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 09:30:31,263 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 09:30:41,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.339e+01 2.550e+01 2.963e+01 5.481e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-15 09:30:43,353 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7250, loss[loss=0.09174, beats_loss=0.01118, ecapa_loss=0.0001202, whisper_loss=0.07936, over 21100.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001502, whisper_loss=0.09132, over 3885832.37 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:30:47,901 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 09:30:50,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3115770.0, ans=0.125 2024-08-15 09:30:59,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3115870.0, ans=0.05 2024-08-15 09:31:00,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3115870.0, ans=0.0 2024-08-15 09:31:09,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3115870.0, ans=0.125 2024-08-15 09:31:19,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115970.0, ans=0.1 2024-08-15 09:31:22,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3115970.0, ans=0.09899494936611666 2024-08-15 09:31:43,211 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 09:31:45,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3116070.0, ans=0.125 2024-08-15 09:31:46,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-15 09:31:59,544 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 09:32:01,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3116270.0, ans=0.05 2024-08-15 09:32:02,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7300, loss[loss=0.09861, beats_loss=0.01215, ecapa_loss=0.0001292, whisper_loss=0.08517, over 15707.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001501, whisper_loss=0.09106, over 3864372.95 frames. ], batch size: 61, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:32:41,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3116470.0, ans=0.125 2024-08-15 09:33:08,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3116670.0, ans=0.1 2024-08-15 09:33:15,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3116670.0, ans=0.125 2024-08-15 09:33:20,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.398e+01 2.645e+01 3.010e+01 2.880e+02, threshold=5.290e+01, percent-clipped=2.0 2024-08-15 09:33:22,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7350, loss[loss=0.09422, beats_loss=0.009123, ecapa_loss=0.0001365, whisper_loss=0.08374, over 16116.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001498, whisper_loss=0.09058, over 3850484.24 frames. ], batch size: 59, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:33:24,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3116770.0, ans=0.0 2024-08-15 09:33:38,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3116870.0, ans=0.2 2024-08-15 09:33:50,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-08-15 09:33:54,846 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-15 09:34:02,140 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.93 vs. limit=10.0 2024-08-15 09:34:18,186 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 09:34:38,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3117270.0, ans=0.0 2024-08-15 09:34:39,606 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7400, loss[loss=0.09442, beats_loss=0.01276, ecapa_loss=0.0001323, whisper_loss=0.08034, over 16526.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001497, whisper_loss=0.09076, over 3853335.51 frames. ], batch size: 68, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:34:40,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3117270.0, ans=0.125 2024-08-15 09:35:02,682 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-15 09:35:13,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3117470.0, ans=0.2 2024-08-15 09:35:19,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3117470.0, ans=0.125 2024-08-15 09:35:39,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3117570.0, ans=0.0 2024-08-15 09:35:45,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3117670.0, ans=0.125 2024-08-15 09:35:59,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.404e+01 2.626e+01 2.946e+01 5.024e+01, threshold=5.253e+01, percent-clipped=0.0 2024-08-15 09:36:00,059 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 34 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 09:36:00,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3117770.0, ans=0.2 2024-08-15 09:36:01,264 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7450, loss[loss=0.1247, beats_loss=0.009695, ecapa_loss=0.0001527, whisper_loss=0.1135, over 21528.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001503, whisper_loss=0.09102, over 3876193.65 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:36:24,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-15 09:36:42,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3117970.0, ans=0.125 2024-08-15 09:36:46,216 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 09:36:59,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3118070.0, ans=0.125 2024-08-15 09:37:15,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3118170.0, ans=0.1 2024-08-15 09:37:17,942 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7500, loss[loss=0.1031, beats_loss=0.01108, ecapa_loss=0.0001224, whisper_loss=0.09078, over 20450.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001512, whisper_loss=0.09108, over 3898761.96 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:37:31,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3118270.0, ans=0.1 2024-08-15 09:37:32,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118370.0, ans=0.1 2024-08-15 09:37:36,667 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 09:37:50,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-15 09:37:51,038 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 09:37:55,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3118470.0, ans=0.125 2024-08-15 09:38:01,361 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 09:38:20,872 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 09:38:23,631 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-15 09:38:32,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.370e+01 2.711e+01 2.994e+01 4.451e+02, threshold=5.422e+01, percent-clipped=4.0 2024-08-15 09:38:32,542 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7550, loss[loss=0.1143, beats_loss=0.01199, ecapa_loss=0.000138, whisper_loss=0.1009, over 22039.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001508, whisper_loss=0.0905, over 3866157.73 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:38:52,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3118870.0, ans=0.0 2024-08-15 09:39:33,923 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:39:36,619 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 09:39:44,637 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 09:39:44,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3119170.0, ans=0.125 2024-08-15 09:39:45,890 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 09:39:52,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7600, loss[loss=0.1081, beats_loss=0.01241, ecapa_loss=0.0001546, whisper_loss=0.09418, over 13618.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.000151, whisper_loss=0.09099, over 3861879.88 frames. ], batch size: 54, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:39:56,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-15 09:40:03,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3119270.0, ans=0.125 2024-08-15 09:40:16,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.16 vs. limit=10.0 2024-08-15 09:40:28,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3119470.0, ans=0.2 2024-08-15 09:40:30,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3119470.0, ans=0.1 2024-08-15 09:40:36,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3119470.0, ans=0.0 2024-08-15 09:40:57,661 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 09:41:09,870 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.254e+01 2.450e+01 2.637e+01 4.565e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-15 09:41:09,900 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7650, loss[loss=0.1184, beats_loss=0.008371, ecapa_loss=0.0001971, whisper_loss=0.108, over 21293.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.09101, over 3882789.41 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:41:22,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3119770.0, ans=0.2 2024-08-15 09:41:37,476 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 09:41:41,894 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 09:41:45,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3119970.0, ans=0.1 2024-08-15 09:41:57,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3120070.0, ans=0.0 2024-08-15 09:42:00,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3120070.0, ans=0.125 2024-08-15 09:42:05,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3120070.0, ans=0.125 2024-08-15 09:42:11,151 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 09:42:20,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3120170.0, ans=0.0 2024-08-15 09:42:25,871 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 09:42:27,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7700, loss[loss=0.08917, beats_loss=0.01148, ecapa_loss=0.0001602, whisper_loss=0.07608, over 19937.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.09108, over 3885676.45 frames. ], batch size: 82, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:42:40,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3120270.0, ans=0.125 2024-08-15 09:42:48,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3120370.0, ans=0.125 2024-08-15 09:42:49,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3120370.0, ans=0.07 2024-08-15 09:42:53,429 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:42:54,517 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 09:43:02,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-15 09:43:13,442 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.280e-02 2024-08-15 09:43:17,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=14.03 vs. limit=15.0 2024-08-15 09:43:39,114 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 09:43:47,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.349e+01 2.670e+01 3.107e+01 2.208e+02, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 09:43:47,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7750, loss[loss=0.08174, beats_loss=0.01237, ecapa_loss=0.0001444, whisper_loss=0.06792, over 14318.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.000151, whisper_loss=0.09046, over 3875801.75 frames. ], batch size: 57, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:44:02,625 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 09:44:05,944 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 32 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 09:44:10,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-08-15 09:44:22,155 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 09:44:24,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3120970.0, ans=0.125 2024-08-15 09:44:24,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3120970.0, ans=0.04949747468305833 2024-08-15 09:44:36,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3121070.0, ans=0.125 2024-08-15 09:44:40,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3121070.0, ans=0.125 2024-08-15 09:44:54,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3121170.0, ans=0.0 2024-08-15 09:45:05,852 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7800, loss[loss=0.06874, beats_loss=0.01445, ecapa_loss=0.000122, whisper_loss=0.05307, over 22947.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001506, whisper_loss=0.0909, over 3901817.22 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:45:13,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3121270.0, ans=0.0 2024-08-15 09:45:18,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3121270.0, ans=0.125 2024-08-15 09:45:21,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3121370.0, ans=0.125 2024-08-15 09:45:54,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3121570.0, ans=0.125 2024-08-15 09:46:14,333 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 35 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 09:46:14,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2024-08-15 09:46:19,857 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 09:46:25,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.314e+01 2.626e+01 3.075e+01 3.695e+02, threshold=5.252e+01, percent-clipped=2.0 2024-08-15 09:46:25,136 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7850, loss[loss=0.08383, beats_loss=0.01145, ecapa_loss=0.0001529, whisper_loss=0.07085, over 16673.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001507, whisper_loss=0.09077, over 3888383.21 frames. ], batch size: 68, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:46:29,411 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 32 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 09:46:41,142 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.154e-01 2024-08-15 09:46:54,721 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 26 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-15 09:46:59,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3121970.0, ans=0.125 2024-08-15 09:47:07,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3122070.0, ans=0.0 2024-08-15 09:47:08,759 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=12.0 2024-08-15 09:47:11,079 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 09:47:11,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3122070.0, ans=0.0 2024-08-15 09:47:12,378 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 09:47:22,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.41 vs. limit=10.0 2024-08-15 09:47:26,844 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-15 09:47:33,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7900, loss[loss=0.09746, beats_loss=0.01161, ecapa_loss=0.0001415, whisper_loss=0.08444, over 22556.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001513, whisper_loss=0.09044, over 3859853.13 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:47:41,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-15 09:47:46,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3122370.0, ans=0.0 2024-08-15 09:47:57,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3122370.0, ans=0.0 2024-08-15 09:48:30,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3122670.0, ans=0.125 2024-08-15 09:48:36,122 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 09:48:38,410 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-15 09:48:44,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.277e+01 2.656e+01 2.964e+01 2.137e+02, threshold=5.312e+01, percent-clipped=1.0 2024-08-15 09:48:44,862 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 7950, loss[loss=0.1266, beats_loss=0.00843, ecapa_loss=0.0001619, whisper_loss=0.1165, over 14830.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001513, whisper_loss=0.09021, over 3860535.88 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:49:32,552 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 40 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-15 09:49:34,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3123070.0, ans=0.125 2024-08-15 09:49:41,458 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 09:49:42,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3123070.0, ans=0.05 2024-08-15 09:49:46,238 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 20 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 09:49:58,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-15 09:49:58,889 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8000, loss[loss=0.1101, beats_loss=0.01055, ecapa_loss=0.0001463, whisper_loss=0.0981, over 21731.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.00015, whisper_loss=0.09078, over 3886692.78 frames. ], batch size: 84, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:50:08,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3123270.0, ans=0.125 2024-08-15 09:50:12,427 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 22 from Vox, 12 fro AS 2024-08-15 09:50:12,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3123370.0, ans=0.1 2024-08-15 09:50:27,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-15 09:50:32,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3123470.0, ans=0.125 2024-08-15 09:50:34,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3123470.0, ans=0.0 2024-08-15 09:50:37,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3123470.0, ans=0.2 2024-08-15 09:50:57,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3123670.0, ans=0.2 2024-08-15 09:51:10,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.05 vs. limit=22.5 2024-08-15 09:51:10,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-08-15 09:51:13,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.280e+01 2.544e+01 2.885e+01 5.910e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-15 09:51:13,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8050, loss[loss=0.0903, beats_loss=0.01471, ecapa_loss=0.0001322, whisper_loss=0.07426, over 21078.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001506, whisper_loss=0.0906, over 3846010.59 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:51:15,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-15 09:51:21,371 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 09:51:27,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3123870.0, ans=0.5 2024-08-15 09:51:45,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3123970.0, ans=0.0 2024-08-15 09:51:46,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3123970.0, ans=0.125 2024-08-15 09:51:48,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3123970.0, ans=0.125 2024-08-15 09:51:55,466 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 09:51:59,911 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 22 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 09:52:09,706 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 09:52:16,890 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 09:52:19,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3124170.0, ans=0.0 2024-08-15 09:52:25,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8100, loss[loss=0.1081, beats_loss=0.008945, ecapa_loss=0.0001391, whisper_loss=0.09773, over 22730.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001518, whisper_loss=0.09145, over 3857132.27 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:52:43,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-15 09:52:48,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3124370.0, ans=0.1 2024-08-15 09:52:53,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3124470.0, ans=0.2 2024-08-15 09:52:55,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3124470.0, ans=10.0 2024-08-15 09:53:13,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3124570.0, ans=0.0 2024-08-15 09:53:19,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-15 09:53:20,243 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-15 09:53:40,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.386e+01 2.609e+01 2.958e+01 3.972e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 09:53:40,778 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8150, loss[loss=0.09332, beats_loss=0.01223, ecapa_loss=0.000138, whisper_loss=0.07971, over 14673.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01046, ecapa_loss=0.0001523, whisper_loss=0.09211, over 3897462.59 frames. ], batch size: 60, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:53:44,138 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 09:53:48,280 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 09:54:08,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2024-08-15 09:54:17,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3124970.0, ans=0.1 2024-08-15 09:54:18,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3124970.0, ans=0.2 2024-08-15 09:54:41,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-08-15 09:54:59,359 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 09:55:06,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8200, loss[loss=0.08326, beats_loss=0.01207, ecapa_loss=0.0001498, whisper_loss=0.0697, over 20748.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001512, whisper_loss=0.09151, over 3901279.23 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:55:40,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3125470.0, ans=0.1 2024-08-15 09:55:54,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3125570.0, ans=0.0 2024-08-15 09:55:58,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3125570.0, ans=0.95 2024-08-15 09:55:58,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3125570.0, ans=0.1 2024-08-15 09:56:05,476 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 09:56:20,946 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08162933588027954, model_norm_threshold=52.17145538330078 2024-08-15 09:56:21,118 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.265e+04, grad_sumsq=3.250e+06, orig_rms_sq=1.005e-02 2024-08-15 09:56:23,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.208e+01 2.504e+01 2.791e+01 6.391e+02, threshold=5.008e+01, percent-clipped=1.0 2024-08-15 09:56:23,744 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8250, loss[loss=0.1126, beats_loss=0.009195, ecapa_loss=0.0001495, whisper_loss=0.1019, over 22067.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.0001517, whisper_loss=0.09143, over 3914878.62 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:56:28,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3125770.0, ans=0.0 2024-08-15 09:56:33,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3125770.0, ans=0.125 2024-08-15 09:56:41,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3125870.0, ans=0.0 2024-08-15 09:56:56,658 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 09:56:58,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3125970.0, ans=0.035 2024-08-15 09:56:58,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3125970.0, ans=0.0 2024-08-15 09:57:04,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3125970.0, ans=0.125 2024-08-15 09:57:07,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3126070.0, ans=15.0 2024-08-15 09:57:17,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3126070.0, ans=0.05 2024-08-15 09:57:21,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-15 09:57:24,098 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 18 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-15 09:57:27,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3126170.0, ans=0.125 2024-08-15 09:57:32,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-08-15 09:57:37,719 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8300, loss[loss=0.1022, beats_loss=0.01263, ecapa_loss=0.0001225, whisper_loss=0.08833, over 22459.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.09082, over 3918044.08 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:57:59,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3126370.0, ans=0.0 2024-08-15 09:58:00,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3126370.0, ans=0.2 2024-08-15 09:58:01,559 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 09:58:10,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3126470.0, ans=0.125 2024-08-15 09:58:29,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126570.0, ans=0.1 2024-08-15 09:58:31,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3126570.0, ans=0.0 2024-08-15 09:58:56,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3126670.0, ans=0.2 2024-08-15 09:58:56,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3126670.0, ans=0.125 2024-08-15 09:59:00,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.411e+01 2.673e+01 3.013e+01 2.459e+02, threshold=5.345e+01, percent-clipped=1.0 2024-08-15 09:59:00,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8350, loss[loss=0.09275, beats_loss=0.01032, ecapa_loss=0.0001156, whisper_loss=0.08127, over 18588.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001489, whisper_loss=0.09125, over 3914648.13 frames. ], batch size: 71, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:59:08,353 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-15 09:59:14,988 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 09:59:21,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=22.5 2024-08-15 09:59:34,010 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 09:59:36,708 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 35 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 10:00:01,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3127170.0, ans=0.125 2024-08-15 10:00:05,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127170.0, ans=0.1 2024-08-15 10:00:16,547 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8400, loss[loss=0.1074, beats_loss=0.01068, ecapa_loss=0.0001456, whisper_loss=0.09525, over 22882.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001508, whisper_loss=0.09174, over 3905451.54 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:00:22,459 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 8 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 10:01:28,114 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 10:01:31,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.355e+01 2.603e+01 2.827e+01 7.121e+01, threshold=5.205e+01, percent-clipped=1.0 2024-08-15 10:01:31,274 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8450, loss[loss=0.0769, beats_loss=0.01006, ecapa_loss=0.000155, whisper_loss=0.06529, over 17286.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01049, ecapa_loss=0.000151, whisper_loss=0.09188, over 3907698.25 frames. ], batch size: 64, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:01:33,171 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 10:01:44,599 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 10:01:48,892 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-15 10:01:53,417 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 10:02:08,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127970.0, ans=0.1 2024-08-15 10:02:28,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3128070.0, ans=0.07 2024-08-15 10:02:36,070 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 10:02:50,899 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 10:02:52,275 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8500, loss[loss=0.08483, beats_loss=0.01286, ecapa_loss=0.0001623, whisper_loss=0.07034, over 16014.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001507, whisper_loss=0.09168, over 3901903.90 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:02:54,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=10.0 2024-08-15 10:03:03,608 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 10:03:14,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3128370.0, ans=0.1 2024-08-15 10:03:18,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3128370.0, ans=0.05 2024-08-15 10:03:29,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3128470.0, ans=0.07 2024-08-15 10:03:35,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3128470.0, ans=0.1 2024-08-15 10:03:38,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3128570.0, ans=0.2 2024-08-15 10:03:47,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3128570.0, ans=0.0 2024-08-15 10:04:09,324 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 19 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 10:04:11,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.400e+01 2.720e+01 3.121e+01 2.458e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 10:04:11,115 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8550, loss[loss=0.09348, beats_loss=0.01016, ecapa_loss=0.000198, whisper_loss=0.08133, over 16213.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.09149, over 3920543.76 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:04:45,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-08-15 10:04:53,809 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 10:05:08,860 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 10:05:25,961 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8600, loss[loss=0.08435, beats_loss=0.01472, ecapa_loss=9.218e-05, whisper_loss=0.06871, over 20322.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001499, whisper_loss=0.0906, over 3891849.72 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:05:28,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2024-08-15 10:05:57,262 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 10:06:11,415 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 10:06:14,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2024-08-15 10:06:18,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=22.5 2024-08-15 10:06:20,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3129570.0, ans=0.2 2024-08-15 10:06:25,140 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 10:06:27,754 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 10:06:32,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3129670.0, ans=0.125 2024-08-15 10:06:37,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.410e+01 2.647e+01 2.940e+01 4.400e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-15 10:06:37,955 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8650, loss[loss=0.09093, beats_loss=0.01224, ecapa_loss=0.0001101, whisper_loss=0.07759, over 15852.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001515, whisper_loss=0.09098, over 3876858.91 frames. ], batch size: 62, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:07:05,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3129870.0, ans=0.0 2024-08-15 10:07:06,260 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 10:07:18,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-08-15 10:07:25,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3130070.0, ans=0.07 2024-08-15 10:07:25,993 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.902e-01 2024-08-15 10:07:43,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3130170.0, ans=0.0 2024-08-15 10:07:47,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3130170.0, ans=0.0 2024-08-15 10:07:48,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-15 10:07:48,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-08-15 10:08:00,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3130270.0, ans=0.0 2024-08-15 10:08:00,915 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8700, loss[loss=0.1171, beats_loss=0.007699, ecapa_loss=0.0001763, whisper_loss=0.1076, over 14152.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001511, whisper_loss=0.09083, over 3881056.85 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:08:02,663 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 10:08:20,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3130370.0, ans=0.125 2024-08-15 10:08:45,380 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:08:52,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3130570.0, ans=0.125 2024-08-15 10:09:02,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3130570.0, ans=0.125 2024-08-15 10:09:06,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3130670.0, ans=0.0 2024-08-15 10:09:22,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.333e+01 2.545e+01 2.859e+01 2.640e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-15 10:09:22,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8750, loss[loss=0.0972, beats_loss=0.01105, ecapa_loss=0.0001329, whisper_loss=0.08483, over 20497.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001514, whisper_loss=0.09119, over 3850481.80 frames. ], batch size: 80, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:09:34,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3130770.0, ans=0.125 2024-08-15 10:09:56,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3130970.0, ans=0.125 2024-08-15 10:10:01,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3130970.0, ans=0.125 2024-08-15 10:10:05,206 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 10:10:05,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3130970.0, ans=0.125 2024-08-15 10:10:38,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3131170.0, ans=0.0 2024-08-15 10:10:39,216 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:10:41,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8800, loss[loss=0.1027, beats_loss=0.009585, ecapa_loss=0.0001461, whisper_loss=0.09169, over 18787.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.0911, over 3871600.32 frames. ], batch size: 76, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:10:53,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3131270.0, ans=0.025 2024-08-15 10:11:06,767 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 10:11:37,832 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 10:11:54,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.280e+01 2.531e+01 2.796e+01 4.372e+01, threshold=5.061e+01, percent-clipped=0.0 2024-08-15 10:11:54,641 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8850, loss[loss=0.09255, beats_loss=0.01326, ecapa_loss=0.0001153, whisper_loss=0.07813, over 16386.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001503, whisper_loss=0.09121, over 3862341.58 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:11:59,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.81 vs. limit=22.5 2024-08-15 10:12:22,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3131870.0, ans=0.125 2024-08-15 10:12:34,551 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 10:13:05,612 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 17 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 10:13:08,896 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8900, loss[loss=0.1048, beats_loss=0.009659, ecapa_loss=0.0001508, whisper_loss=0.0936, over 17203.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001505, whisper_loss=0.09137, over 3860980.77 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:13:09,177 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 10:13:11,645 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 10:13:11,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3132270.0, ans=0.125 2024-08-15 10:13:20,525 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 10:13:25,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-15 10:13:33,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3132370.0, ans=0.0 2024-08-15 10:13:39,495 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 10:13:48,542 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 10:14:12,482 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 10:14:17,791 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 10:14:18,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132770.0, ans=0.1 2024-08-15 10:14:18,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.382e+01 2.533e+01 2.856e+01 1.200e+02, threshold=5.066e+01, percent-clipped=2.0 2024-08-15 10:14:18,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 8950, loss[loss=0.094, beats_loss=0.01066, ecapa_loss=0.0001482, whisper_loss=0.08185, over 18066.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001485, whisper_loss=0.09062, over 3860412.97 frames. ], batch size: 72, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:14:24,635 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 10:14:33,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3132870.0, ans=0.125 2024-08-15 10:14:35,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3132870.0, ans=0.125 2024-08-15 10:14:50,231 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-15 10:15:01,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3133070.0, ans=0.125 2024-08-15 10:15:08,584 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 10:15:11,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3133070.0, ans=0.125 2024-08-15 10:15:21,674 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:15:24,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.69 vs. limit=10.0 2024-08-15 10:15:29,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9000, loss[loss=0.08576, beats_loss=0.01147, ecapa_loss=0.0001558, whisper_loss=0.07274, over 21259.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01078, ecapa_loss=0.0001498, whisper_loss=0.08986, over 3874251.15 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:15:29,270 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 10:16:12,778 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005364, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 10:16:26,093 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7959, 2.9834, 3.2823, 2.6846], device='cuda:2') 2024-08-15 10:16:35,746 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-15 10:18:37,937 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 10:18:37,941 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 10:19:07,445 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:19:24,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3133570.0, ans=0.0 2024-08-15 10:19:25,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3133570.0, ans=0.1 2024-08-15 10:19:47,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=12.0 2024-08-15 10:19:48,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.223e+01 2.558e+01 2.882e+01 3.996e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 10:19:48,078 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9050, loss[loss=0.0872, beats_loss=0.01061, ecapa_loss=0.0001615, whisper_loss=0.07498, over 19284.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01073, ecapa_loss=0.0001499, whisper_loss=0.08946, over 3822408.06 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:20:09,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3133870.0, ans=0.125 2024-08-15 10:20:17,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3133970.0, ans=0.0 2024-08-15 10:20:36,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3134070.0, ans=0.0 2024-08-15 10:20:44,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3134170.0, ans=0.2 2024-08-15 10:20:49,925 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 10:20:51,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3134170.0, ans=0.2 2024-08-15 10:20:57,597 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9100, loss[loss=0.0886, beats_loss=0.008978, ecapa_loss=0.0001259, whisper_loss=0.07837, over 16127.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001522, whisper_loss=0.09025, over 3851689.05 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:20:57,790 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 10:20:58,329 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2024-08-15 10:21:08,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.15 vs. limit=6.0 2024-08-15 10:21:17,318 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 10:21:22,915 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 10:21:29,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3134470.0, ans=0.1 2024-08-15 10:21:37,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2024-08-15 10:21:38,021 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 15 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 10:21:39,896 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 10:21:44,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3134570.0, ans=0.0 2024-08-15 10:21:52,592 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 10:21:54,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3134670.0, ans=0.0 2024-08-15 10:21:55,319 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 10:22:08,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.360e+01 2.655e+01 2.981e+01 2.632e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-15 10:22:08,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9150, loss[loss=0.0995, beats_loss=0.01136, ecapa_loss=0.0001332, whisper_loss=0.0868, over 19345.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001529, whisper_loss=0.09077, over 3852700.20 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:22:30,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-15 10:22:35,347 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 10:22:48,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3134970.0, ans=0.95 2024-08-15 10:22:48,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-08-15 10:22:50,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3135070.0, ans=0.125 2024-08-15 10:22:59,821 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 10:23:02,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3135070.0, ans=0.2 2024-08-15 10:23:02,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-08-15 10:23:09,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135170.0, ans=0.1 2024-08-15 10:23:23,195 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9200, loss[loss=0.08649, beats_loss=0.01265, ecapa_loss=0.0001404, whisper_loss=0.07243, over 21869.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001521, whisper_loss=0.09027, over 3891184.08 frames. ], batch size: 87, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:23:40,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3135370.0, ans=0.125 2024-08-15 10:23:46,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3135370.0, ans=0.125 2024-08-15 10:23:53,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.88 vs. limit=22.5 2024-08-15 10:24:22,920 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 10:24:37,869 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 10:24:46,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.313e+01 2.628e+01 2.847e+01 1.469e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-15 10:24:46,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9250, loss[loss=0.1048, beats_loss=0.01023, ecapa_loss=0.0001749, whisper_loss=0.09282, over 19355.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001522, whisper_loss=0.09017, over 3887854.21 frames. ], batch size: 76, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:24:54,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3135770.0, ans=0.125 2024-08-15 10:24:54,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3135770.0, ans=0.0 2024-08-15 10:24:59,611 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 10:25:04,546 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 10:25:14,007 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 10:25:14,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3135870.0, ans=0.1 2024-08-15 10:25:17,752 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 10:25:26,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3135970.0, ans=0.0 2024-08-15 10:25:38,821 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 10:25:55,491 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 16 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 10:26:06,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136170.0, ans=0.1 2024-08-15 10:26:08,787 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 10:26:13,809 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9300, loss[loss=0.09759, beats_loss=0.01048, ecapa_loss=0.0001332, whisper_loss=0.08577, over 16063.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09029, over 3907468.24 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:26:14,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3136270.0, ans=0.1 2024-08-15 10:26:26,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3136270.0, ans=0.125 2024-08-15 10:26:39,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3136370.0, ans=0.2 2024-08-15 10:26:51,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3136470.0, ans=0.125 2024-08-15 10:26:56,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3136470.0, ans=0.125 2024-08-15 10:27:22,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3136670.0, ans=0.2 2024-08-15 10:27:26,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.363e+01 2.589e+01 2.922e+01 5.036e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-15 10:27:26,498 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9350, loss[loss=0.09819, beats_loss=0.01251, ecapa_loss=0.0001417, whisper_loss=0.08427, over 19094.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001529, whisper_loss=0.09049, over 3907054.82 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:27:27,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-15 10:27:30,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3136770.0, ans=0.0 2024-08-15 10:27:53,453 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 10:27:55,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-15 10:28:02,352 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:28:08,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-15 10:28:10,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3137070.0, ans=0.05 2024-08-15 10:28:25,085 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 10:28:26,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3137170.0, ans=0.125 2024-08-15 10:28:32,352 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 10:28:35,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9400, loss[loss=0.08159, beats_loss=0.01349, ecapa_loss=0.0001107, whisper_loss=0.067, over 20702.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001526, whisper_loss=0.09027, over 3920165.08 frames. ], batch size: 84, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:28:39,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3137270.0, ans=0.125 2024-08-15 10:28:40,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3137270.0, ans=0.05 2024-08-15 10:28:43,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-08-15 10:28:56,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3137370.0, ans=0.125 2024-08-15 10:29:11,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3137470.0, ans=0.2 2024-08-15 10:29:19,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3137570.0, ans=0.95 2024-08-15 10:29:20,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3137570.0, ans=0.5 2024-08-15 10:29:22,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3137570.0, ans=0.0 2024-08-15 10:29:45,423 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.347e+01 2.545e+01 2.871e+01 4.993e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-15 10:29:45,445 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9450, loss[loss=0.06928, beats_loss=0.01475, ecapa_loss=0.0001403, whisper_loss=0.05313, over 12849.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.08992, over 3908829.19 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:29:48,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3137770.0, ans=0.125 2024-08-15 10:29:50,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3137770.0, ans=0.125 2024-08-15 10:29:55,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3137770.0, ans=0.125 2024-08-15 10:29:55,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3137770.0, ans=0.125 2024-08-15 10:30:06,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137870.0, ans=0.125 2024-08-15 10:30:12,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3137970.0, ans=15.0 2024-08-15 10:30:15,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3137970.0, ans=0.0 2024-08-15 10:30:20,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3137970.0, ans=0.125 2024-08-15 10:30:27,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-15 10:30:31,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138070.0, ans=0.1 2024-08-15 10:30:35,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-15 10:30:54,635 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9500, loss[loss=0.1047, beats_loss=0.008718, ecapa_loss=0.0001684, whisper_loss=0.09425, over 17800.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01067, ecapa_loss=0.0001517, whisper_loss=0.08955, over 3889414.46 frames. ], batch size: 70, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:30:56,182 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 10:31:27,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3138470.0, ans=0.125 2024-08-15 10:31:37,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3138570.0, ans=0.2 2024-08-15 10:31:40,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3138570.0, ans=0.125 2024-08-15 10:31:42,749 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 10:31:56,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3138670.0, ans=0.1 2024-08-15 10:31:58,006 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 10:32:01,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3138670.0, ans=0.125 2024-08-15 10:32:01,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3138670.0, ans=0.125 2024-08-15 10:32:03,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.464e+01 2.657e+01 3.059e+01 1.936e+02, threshold=5.313e+01, percent-clipped=3.0 2024-08-15 10:32:03,600 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9550, loss[loss=0.1115, beats_loss=0.01064, ecapa_loss=0.0001467, whisper_loss=0.09941, over 21653.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01067, ecapa_loss=0.0001524, whisper_loss=0.08929, over 3892842.40 frames. ], batch size: 85, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:32:20,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3138870.0, ans=0.0 2024-08-15 10:32:23,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3138870.0, ans=0.2 2024-08-15 10:32:35,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3138970.0, ans=0.125 2024-08-15 10:32:40,666 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 10:32:51,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3139070.0, ans=0.125 2024-08-15 10:33:12,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3139170.0, ans=0.125 2024-08-15 10:33:15,636 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9600, loss[loss=0.1089, beats_loss=0.01079, ecapa_loss=0.0001548, whisper_loss=0.09654, over 18981.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.0001524, whisper_loss=0.08933, over 3863180.48 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:33:15,910 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 25 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-15 10:33:20,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-15 10:33:23,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3139270.0, ans=0.0 2024-08-15 10:33:30,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-15 10:33:56,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3139470.0, ans=0.1 2024-08-15 10:34:14,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3139670.0, ans=0.125 2024-08-15 10:34:17,183 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 10:34:17,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3139670.0, ans=0.07 2024-08-15 10:34:26,679 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9650, loss[loss=0.1089, beats_loss=0.008297, ecapa_loss=0.0001589, whisper_loss=0.09902, over 14148.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001518, whisper_loss=0.08907, over 3850365.30 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:34:27,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.224e+01 2.493e+01 2.795e+01 4.633e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-15 10:34:31,960 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-08-15 10:34:57,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3139970.0, ans=0.125 2024-08-15 10:35:04,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3139970.0, ans=0.0 2024-08-15 10:35:14,266 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 39 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 10:35:16,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3140070.0, ans=22.5 2024-08-15 10:35:30,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3140170.0, ans=0.125 2024-08-15 10:35:30,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3140170.0, ans=0.125 2024-08-15 10:35:31,845 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 24 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-15 10:35:32,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3140170.0, ans=0.1 2024-08-15 10:35:36,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9700, loss[loss=0.09605, beats_loss=0.009014, ecapa_loss=0.0001662, whisper_loss=0.08538, over 16815.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001537, whisper_loss=0.0902, over 3884372.20 frames. ], batch size: 66, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:35:38,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3140270.0, ans=0.125 2024-08-15 10:35:44,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3140270.0, ans=0.0 2024-08-15 10:35:48,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3140370.0, ans=0.2 2024-08-15 10:35:54,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3140370.0, ans=0.1 2024-08-15 10:35:57,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3140370.0, ans=0.125 2024-08-15 10:36:01,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-15 10:36:04,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-08-15 10:36:05,092 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 10:36:14,726 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 10:36:33,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2024-08-15 10:36:38,563 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 10:36:45,567 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9750, loss[loss=0.1213, beats_loss=0.01005, ecapa_loss=0.0001411, whisper_loss=0.1098, over 23140.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001519, whisper_loss=0.08952, over 3876263.05 frames. ], batch size: 91, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:36:46,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.354e+01 2.591e+01 2.841e+01 9.647e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-15 10:36:48,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3140770.0, ans=0.125 2024-08-15 10:37:11,780 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 10:37:31,924 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 10:37:33,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3141070.0, ans=0.07 2024-08-15 10:37:36,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2024-08-15 10:37:44,589 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 10:37:46,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3141170.0, ans=0.125 2024-08-15 10:37:55,837 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9800, loss[loss=0.1282, beats_loss=0.008779, ecapa_loss=0.0001357, whisper_loss=0.118, over 23936.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001527, whisper_loss=0.09012, over 3881518.10 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:38:09,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-15 10:38:12,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.47 vs. limit=6.0 2024-08-15 10:38:20,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3141370.0, ans=0.125 2024-08-15 10:38:23,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.66 vs. limit=22.5 2024-08-15 10:38:32,940 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 10:38:34,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3141470.0, ans=0.0 2024-08-15 10:38:35,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141470.0, ans=0.1 2024-08-15 10:38:41,108 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 10:38:49,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3141570.0, ans=0.0 2024-08-15 10:38:58,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3141670.0, ans=0.125 2024-08-15 10:39:03,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3141670.0, ans=0.2 2024-08-15 10:39:05,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9850, loss[loss=0.09815, beats_loss=0.01055, ecapa_loss=0.0001543, whisper_loss=0.08605, over 20439.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.09044, over 3902273.98 frames. ], batch size: 85, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:39:06,748 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.300e+01 2.506e+01 2.923e+01 9.908e+01, threshold=5.012e+01, percent-clipped=1.0 2024-08-15 10:39:11,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2024-08-15 10:39:40,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3141970.0, ans=0.125 2024-08-15 10:39:41,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3141970.0, ans=0.125 2024-08-15 10:39:54,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-15 10:40:13,763 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9900, loss[loss=0.09129, beats_loss=0.01021, ecapa_loss=0.0001253, whisper_loss=0.07982, over 17741.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001512, whisper_loss=0.09061, over 3923063.67 frames. ], batch size: 68, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:40:14,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3142270.0, ans=0.0 2024-08-15 10:40:41,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3142470.0, ans=0.125 2024-08-15 10:40:53,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3142570.0, ans=0.125 2024-08-15 10:40:55,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-15 10:40:57,984 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 10:41:15,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3142670.0, ans=0.0 2024-08-15 10:41:17,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3142670.0, ans=0.1 2024-08-15 10:41:22,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 9950, loss[loss=0.09614, beats_loss=0.01099, ecapa_loss=0.000145, whisper_loss=0.0837, over 20643.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001521, whisper_loss=0.09093, over 3910478.07 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:41:24,921 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.640e+01 2.920e+01 4.147e+01, threshold=5.279e+01, percent-clipped=0.0 2024-08-15 10:41:44,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142870.0, ans=0.1 2024-08-15 10:42:02,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3142970.0, ans=0.0 2024-08-15 10:42:13,075 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 16 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 10:42:32,807 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10000, loss[loss=0.08147, beats_loss=0.01256, ecapa_loss=0.0001446, whisper_loss=0.06746, over 16707.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001516, whisper_loss=0.09039, over 3882140.68 frames. ], batch size: 70, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:42:35,046 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-15 10:42:47,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3143370.0, ans=0.0 2024-08-15 10:42:47,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3143370.0, ans=0.125 2024-08-15 10:42:52,564 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 18 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 10:43:04,274 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-15 10:43:18,271 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.534e+05 2024-08-15 10:43:33,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-15 10:43:38,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3143670.0, ans=0.0 2024-08-15 10:43:41,502 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 10:43:50,139 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10050, loss[loss=0.1019, beats_loss=0.01024, ecapa_loss=0.0001357, whisper_loss=0.0903, over 23124.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001508, whisper_loss=0.09034, over 3857139.13 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:43:53,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.380e+01 2.609e+01 2.956e+01 1.893e+02, threshold=5.219e+01, percent-clipped=1.0 2024-08-15 10:44:05,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3143770.0, ans=0.125 2024-08-15 10:44:23,151 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 16 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 10:44:28,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3143970.0, ans=0.125 2024-08-15 10:44:34,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3143970.0, ans=0.125 2024-08-15 10:44:38,168 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.569e+01 2024-08-15 10:44:40,347 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 10:44:49,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3144070.0, ans=0.0 2024-08-15 10:45:22,258 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 10:45:28,057 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10100, loss[loss=0.074, beats_loss=0.01391, ecapa_loss=0.0001664, whisper_loss=0.05843, over 12709.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001511, whisper_loss=0.0902, over 3844140.86 frames. ], batch size: 54, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:45:39,232 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 10:45:52,155 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 40 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 10:46:11,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-15 10:46:11,815 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 10:46:45,317 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 10:46:49,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2024-08-15 10:47:06,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3144670.0, ans=0.2 2024-08-15 10:47:17,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=22.5 2024-08-15 10:47:24,269 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10150, loss[loss=0.09556, beats_loss=0.01176, ecapa_loss=0.0001296, whisper_loss=0.0825, over 23401.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001526, whisper_loss=0.09093, over 3873832.78 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:47:26,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3144770.0, ans=0.1 2024-08-15 10:47:29,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.324e+01 2.588e+01 2.924e+01 3.968e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 10:48:11,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2024-08-15 10:48:15,186 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-15 10:48:22,652 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-15 10:48:25,518 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 10:48:29,952 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 10:48:40,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3145070.0, ans=0.125 2024-08-15 10:48:54,374 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 10:49:01,527 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 10:49:06,183 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10200, loss[loss=0.07177, beats_loss=0.012, ecapa_loss=0.0001394, whisper_loss=0.05838, over 15133.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.0904, over 3841541.57 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:49:19,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3145270.0, ans=0.0 2024-08-15 10:49:39,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-08-15 10:49:41,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=22.5 2024-08-15 10:49:42,299 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 10:49:48,082 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 10:49:57,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3145570.0, ans=0.0 2024-08-15 10:50:03,925 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 10:50:06,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3145670.0, ans=0.125 2024-08-15 10:50:20,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3145670.0, ans=0.0 2024-08-15 10:50:23,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10250, loss[loss=0.1004, beats_loss=0.007725, ecapa_loss=0.00015, whisper_loss=0.09115, over 20285.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.000151, whisper_loss=0.09081, over 3868975.42 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:50:23,631 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 10:50:26,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.258e+01 2.433e+01 2.798e+01 3.625e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 10:50:53,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3145970.0, ans=0.0 2024-08-15 10:51:06,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3145970.0, ans=0.125 2024-08-15 10:51:09,190 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-15 10:51:16,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3146070.0, ans=0.125 2024-08-15 10:51:17,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2024-08-15 10:51:18,041 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 11 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 10:51:42,149 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10300, loss[loss=0.1148, beats_loss=0.01021, ecapa_loss=0.0001569, whisper_loss=0.1031, over 14941.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001513, whisper_loss=0.09076, over 3855186.12 frames. ], batch size: 62, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:51:57,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3146270.0, ans=0.125 2024-08-15 10:52:20,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-08-15 10:52:25,361 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 10:52:35,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-15 10:52:55,526 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 10:52:57,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3146670.0, ans=0.0 2024-08-15 10:53:04,686 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 10:53:05,768 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10350, loss[loss=0.1145, beats_loss=0.008632, ecapa_loss=0.0001518, whisper_loss=0.1043, over 15932.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001497, whisper_loss=0.09004, over 3882668.82 frames. ], batch size: 64, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:53:08,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.405e+01 2.644e+01 3.063e+01 2.497e+02, threshold=5.287e+01, percent-clipped=1.0 2024-08-15 10:53:15,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3146770.0, ans=0.2 2024-08-15 10:53:24,507 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.783e+01 2024-08-15 10:53:25,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3146870.0, ans=0.125 2024-08-15 10:53:29,580 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 10:53:42,633 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 10:53:49,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3146970.0, ans=0.1 2024-08-15 10:53:50,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3146970.0, ans=0.0 2024-08-15 10:53:51,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-15 10:53:54,708 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 10:54:16,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3147170.0, ans=0.125 2024-08-15 10:54:28,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10400, loss[loss=0.08992, beats_loss=0.01341, ecapa_loss=0.000113, whisper_loss=0.07538, over 23922.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01075, ecapa_loss=0.0001499, whisper_loss=0.08949, over 3875787.27 frames. ], batch size: 96, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:54:31,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3147270.0, ans=0.0 2024-08-15 10:54:34,278 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 10:54:46,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3147370.0, ans=0.0 2024-08-15 10:54:52,770 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 10:54:54,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3147370.0, ans=0.125 2024-08-15 10:55:25,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3147570.0, ans=15.0 2024-08-15 10:55:30,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-15 10:55:52,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10450, loss[loss=0.1163, beats_loss=0.01049, ecapa_loss=0.0001018, whisper_loss=0.1048, over 19906.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01074, ecapa_loss=0.0001493, whisper_loss=0.08914, over 3837654.09 frames. ], batch size: 72, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:55:54,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3147770.0, ans=0.025 2024-08-15 10:55:55,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.272e+01 2.480e+01 2.758e+01 4.514e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-15 10:55:55,255 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 24 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-15 10:56:03,186 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-15 10:56:06,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147870.0, ans=0.1 2024-08-15 10:56:09,423 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 10:56:13,430 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-15 10:56:28,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3147970.0, ans=0.125 2024-08-15 10:56:51,239 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 10:57:02,274 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 10:57:08,657 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10500, loss[loss=0.08821, beats_loss=0.01149, ecapa_loss=0.0001404, whisper_loss=0.07532, over 14474.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001494, whisper_loss=0.08982, over 3837808.60 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:57:10,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3148270.0, ans=0.2 2024-08-15 10:57:16,646 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 10:57:25,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3148370.0, ans=0.125 2024-08-15 10:57:29,067 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 10:57:50,557 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.51 vs. limit=10.0 2024-08-15 10:57:53,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3148470.0, ans=0.95 2024-08-15 10:57:56,069 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 10:58:03,320 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 10:58:19,520 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 10:58:24,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3148670.0, ans=0.125 2024-08-15 10:58:27,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3148670.0, ans=0.0 2024-08-15 10:58:29,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3148670.0, ans=0.125 2024-08-15 10:58:31,831 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10550, loss[loss=0.09776, beats_loss=0.01058, ecapa_loss=0.0001456, whisper_loss=0.08573, over 16469.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.0001501, whisper_loss=0.0897, over 3834673.24 frames. ], batch size: 64, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:58:34,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.372e+01 2.650e+01 2.883e+01 3.926e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 10:58:45,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3148870.0, ans=0.125 2024-08-15 10:58:46,939 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-15 10:58:49,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3148870.0, ans=0.125 2024-08-15 10:58:50,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2024-08-15 10:59:39,633 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 10:59:40,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3149170.0, ans=0.025 2024-08-15 10:59:49,023 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10600, loss[loss=0.1201, beats_loss=0.008076, ecapa_loss=0.0001869, whisper_loss=0.1101, over 23036.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001506, whisper_loss=0.0903, over 3858697.87 frames. ], batch size: 92, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:59:57,808 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 19 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 11:00:07,129 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 11:00:19,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3149470.0, ans=0.0 2024-08-15 11:00:19,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-15 11:00:22,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3149470.0, ans=0.07 2024-08-15 11:00:37,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3149570.0, ans=0.125 2024-08-15 11:01:00,875 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 11:01:01,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-15 11:01:02,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3149670.0, ans=0.125 2024-08-15 11:01:03,716 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 11:01:06,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10650, loss[loss=0.1251, beats_loss=0.007799, ecapa_loss=0.0001804, whisper_loss=0.1155, over 21599.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001491, whisper_loss=0.09029, over 3839029.66 frames. ], batch size: 86, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:01:06,911 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-15 11:01:07,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3149770.0, ans=0.0 2024-08-15 11:01:09,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.413e+01 2.629e+01 2.898e+01 3.897e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-15 11:01:26,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3149870.0, ans=0.0 2024-08-15 11:01:27,247 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 11:01:30,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-15 11:01:38,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3149870.0, ans=0.125 2024-08-15 11:01:44,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-15 11:01:53,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3149970.0, ans=0.0 2024-08-15 11:01:53,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3149970.0, ans=0.0 2024-08-15 11:01:54,324 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 11 from Vox, 45 fro AS 2024-08-15 11:02:07,790 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 11:02:08,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3150070.0, ans=0.125 2024-08-15 11:02:14,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3150170.0, ans=0.025 2024-08-15 11:02:23,164 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 11:02:23,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3150170.0, ans=0.2 2024-08-15 11:02:28,152 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 11:02:30,777 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10700, loss[loss=0.1096, beats_loss=0.01039, ecapa_loss=0.0001373, whisper_loss=0.09781, over 22937.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001489, whisper_loss=0.09058, over 3891355.78 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:02:58,997 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 11:03:03,336 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 11:03:06,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3150470.0, ans=0.0 2024-08-15 11:03:12,030 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 21 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 11:03:23,044 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 11:03:32,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-15 11:03:36,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-15 11:03:42,956 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 11:03:44,584 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10750, loss[loss=0.1013, beats_loss=0.01231, ecapa_loss=0.00016, whisper_loss=0.08744, over 19242.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001491, whisper_loss=0.09045, over 3889124.36 frames. ], batch size: 79, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:03:47,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.262e+01 2.469e+01 2.772e+01 4.273e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-15 11:03:48,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3150770.0, ans=0.0 2024-08-15 11:03:57,198 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-08-15 11:03:59,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3150870.0, ans=0.125 2024-08-15 11:04:05,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3150870.0, ans=0.0 2024-08-15 11:04:06,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3150870.0, ans=0.125 2024-08-15 11:04:09,754 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 11:04:20,267 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 11:04:29,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-15 11:04:42,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3151170.0, ans=0.125 2024-08-15 11:04:43,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-15 11:04:46,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-08-15 11:04:58,150 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10800, loss[loss=0.105, beats_loss=0.01174, ecapa_loss=0.0001225, whisper_loss=0.09205, over 16767.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001486, whisper_loss=0.09087, over 3887216.96 frames. ], batch size: 64, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:05:00,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3151270.0, ans=0.0 2024-08-15 11:05:04,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3151270.0, ans=0.125 2024-08-15 11:05:20,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3151370.0, ans=0.0 2024-08-15 11:05:53,893 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:05:59,556 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 11:06:10,308 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 11:06:22,941 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10850, loss[loss=0.08098, beats_loss=0.007861, ecapa_loss=0.0001963, whisper_loss=0.07115, over 14870.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001473, whisper_loss=0.09032, over 3881737.89 frames. ], batch size: 63, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:06:26,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.371e+01 2.566e+01 2.885e+01 4.578e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-15 11:06:39,014 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=12.0 2024-08-15 11:06:39,899 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 11:06:41,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3151870.0, ans=0.125 2024-08-15 11:06:53,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2024-08-15 11:07:00,615 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 11:07:04,033 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 11:07:07,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3151970.0, ans=0.125 2024-08-15 11:07:07,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-08-15 11:07:08,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3151970.0, ans=0.125 2024-08-15 11:07:10,001 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 11:07:34,377 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 11:07:44,723 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10900, loss[loss=0.1131, beats_loss=0.00856, ecapa_loss=0.0001334, whisper_loss=0.1032, over 19643.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001473, whisper_loss=0.09081, over 3896288.04 frames. ], batch size: 75, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:08:12,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-15 11:08:22,895 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-15 11:08:46,830 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 11:09:00,299 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 11:09:09,560 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 10950, loss[loss=0.06287, beats_loss=0.0154, ecapa_loss=0.0001173, whisper_loss=0.04629, over 15817.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.000147, whisper_loss=0.09136, over 3886528.15 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:09:12,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.384e+01 2.656e+01 2.933e+01 4.855e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-15 11:09:19,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3152770.0, ans=0.0 2024-08-15 11:09:30,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3152870.0, ans=0.025 2024-08-15 11:10:07,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3153070.0, ans=0.0 2024-08-15 11:10:09,010 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=15.0 2024-08-15 11:10:09,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153170.0, ans=0.1 2024-08-15 11:10:14,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3153170.0, ans=0.0 2024-08-15 11:10:19,712 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 11:10:25,261 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11000, loss[loss=0.09484, beats_loss=0.009015, ecapa_loss=0.0001765, whisper_loss=0.08406, over 18354.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001495, whisper_loss=0.09125, over 3909602.78 frames. ], batch size: 76, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:10:48,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3153370.0, ans=0.04949747468305833 2024-08-15 11:10:50,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3153370.0, ans=0.125 2024-08-15 11:10:51,257 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 11:10:51,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3153370.0, ans=0.2 2024-08-15 11:10:52,652 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 11:10:52,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3153370.0, ans=0.125 2024-08-15 11:10:56,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3153470.0, ans=0.1 2024-08-15 11:11:02,143 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 11:11:13,417 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 11:11:13,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3153570.0, ans=0.125 2024-08-15 11:11:17,645 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 11:11:23,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3153670.0, ans=0.2 2024-08-15 11:11:38,505 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11050, loss[loss=0.08176, beats_loss=0.012, ecapa_loss=0.0001548, whisper_loss=0.06821, over 17597.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001489, whisper_loss=0.09121, over 3921069.38 frames. ], batch size: 73, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:11:41,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.292e+01 2.575e+01 2.942e+01 2.806e+02, threshold=5.150e+01, percent-clipped=2.0 2024-08-15 11:12:00,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3153870.0, ans=0.125 2024-08-15 11:12:28,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3154070.0, ans=0.125 2024-08-15 11:12:41,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3154170.0, ans=0.125 2024-08-15 11:13:00,773 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11100, loss[loss=0.1035, beats_loss=0.01163, ecapa_loss=0.0001333, whisper_loss=0.09054, over 23339.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001496, whisper_loss=0.09133, over 3885774.40 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:13:20,275 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-15 11:14:16,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11150, loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001702, whisper_loss=0.08962, over 20723.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001496, whisper_loss=0.09093, over 3883991.86 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:14:19,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.361e+01 2.547e+01 2.785e+01 4.285e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-15 11:14:24,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3154770.0, ans=0.0 2024-08-15 11:14:32,220 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 11:14:40,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-08-15 11:14:46,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2024-08-15 11:15:18,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3155170.0, ans=0.09899494936611666 2024-08-15 11:15:24,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3155170.0, ans=0.0 2024-08-15 11:15:28,536 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 11:15:31,184 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11200, loss[loss=0.1071, beats_loss=0.01254, ecapa_loss=0.0001472, whisper_loss=0.09309, over 20595.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.0906, over 3875225.20 frames. ], batch size: 82, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:15:38,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3155270.0, ans=0.125 2024-08-15 11:15:41,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3155270.0, ans=0.025 2024-08-15 11:15:50,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3155370.0, ans=0.125 2024-08-15 11:16:03,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3155470.0, ans=0.0 2024-08-15 11:16:03,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3155470.0, ans=0.125 2024-08-15 11:16:04,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3155470.0, ans=0.125 2024-08-15 11:16:10,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.84 vs. limit=15.0 2024-08-15 11:16:14,287 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:16:29,498 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 11:16:44,005 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11250, loss[loss=0.1088, beats_loss=0.01173, ecapa_loss=0.0001349, whisper_loss=0.09567, over 15608.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01062, ecapa_loss=0.0001506, whisper_loss=0.09079, over 3916601.73 frames. ], batch size: 62, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:16:46,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.380e+01 2.622e+01 3.019e+01 1.107e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-15 11:16:53,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3155770.0, ans=0.0 2024-08-15 11:16:57,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.04 vs. limit=15.0 2024-08-15 11:16:59,392 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 26 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-15 11:17:15,976 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 11:17:26,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3155970.0, ans=0.5 2024-08-15 11:17:38,153 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 11:17:46,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-15 11:17:48,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-15 11:18:00,545 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11300, loss[loss=0.09113, beats_loss=0.01081, ecapa_loss=0.0001221, whisper_loss=0.0791, over 19338.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001497, whisper_loss=0.0909, over 3864880.38 frames. ], batch size: 76, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:18:26,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3156370.0, ans=0.0 2024-08-15 11:18:39,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3156470.0, ans=0.0 2024-08-15 11:18:41,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3156470.0, ans=0.0 2024-08-15 11:18:56,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3156570.0, ans=0.125 2024-08-15 11:19:26,087 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11350, loss[loss=0.1242, beats_loss=0.008788, ecapa_loss=0.0001767, whisper_loss=0.1136, over 13902.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001497, whisper_loss=0.09123, over 3861422.47 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:19:29,195 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.374e+01 2.563e+01 2.940e+01 7.855e+01, threshold=5.126e+01, percent-clipped=1.0 2024-08-15 11:19:46,994 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 11:19:49,648 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 11:19:49,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3156870.0, ans=0.125 2024-08-15 11:20:26,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3157170.0, ans=0.125 2024-08-15 11:20:36,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3157170.0, ans=15.0 2024-08-15 11:20:37,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3157170.0, ans=0.1 2024-08-15 11:20:39,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157270.0, ans=0.1 2024-08-15 11:20:40,173 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11400, loss[loss=0.1029, beats_loss=0.009822, ecapa_loss=0.0001934, whisper_loss=0.0911, over 15157.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001504, whisper_loss=0.09068, over 3870580.29 frames. ], batch size: 62, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:20:57,633 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 11:21:07,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2024-08-15 11:21:08,576 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 11:21:14,586 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-15 11:21:30,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3157570.0, ans=0.125 2024-08-15 11:21:38,449 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 11:21:42,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3157570.0, ans=0.0 2024-08-15 11:21:44,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3157670.0, ans=0.0 2024-08-15 11:21:48,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3157670.0, ans=0.2 2024-08-15 11:21:48,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3157670.0, ans=0.2 2024-08-15 11:21:53,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-15 11:22:00,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157770.0, ans=0.1 2024-08-15 11:22:01,585 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11450, loss[loss=0.08669, beats_loss=0.009634, ecapa_loss=0.0001938, whisper_loss=0.07512, over 14101.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001513, whisper_loss=0.09043, over 3870292.39 frames. ], batch size: 61, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:22:04,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.396e+01 2.613e+01 2.879e+01 7.410e+02, threshold=5.227e+01, percent-clipped=0.0 2024-08-15 11:22:04,388 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07053599506616592, model_norm_threshold=52.26521682739258 2024-08-15 11:22:04,562 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.421e+04, grad_sumsq=9.420e+04, orig_rms_sq=5.754e-01 2024-08-15 11:22:07,278 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 11:22:14,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3157770.0, ans=0.125 2024-08-15 11:22:38,042 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-15 11:22:38,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3157970.0, ans=0.125 2024-08-15 11:22:53,391 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 11:22:56,753 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-15 11:23:05,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3158070.0, ans=0.125 2024-08-15 11:23:20,550 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 11:23:23,250 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11500, loss[loss=0.1028, beats_loss=0.00796, ecapa_loss=0.0001928, whisper_loss=0.09291, over 15729.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001514, whisper_loss=0.09011, over 3852109.12 frames. ], batch size: 65, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:23:31,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3158270.0, ans=0.1 2024-08-15 11:23:33,452 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 11:23:37,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=22.5 2024-08-15 11:23:43,748 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:23:56,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3158470.0, ans=0.2 2024-08-15 11:24:07,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3158470.0, ans=0.0 2024-08-15 11:24:22,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3158570.0, ans=0.2 2024-08-15 11:24:22,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-15 11:24:34,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3158670.0, ans=0.125 2024-08-15 11:24:38,516 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:24:40,605 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11550, loss[loss=0.1229, beats_loss=0.00878, ecapa_loss=0.0001509, whisper_loss=0.1126, over 22628.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001516, whisper_loss=0.09022, over 3879240.57 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:24:43,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3158770.0, ans=0.125 2024-08-15 11:24:44,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.411e+01 2.579e+01 2.880e+01 5.127e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-15 11:25:00,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3158870.0, ans=0.125 2024-08-15 11:25:11,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2024-08-15 11:25:12,182 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 11:25:14,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3158970.0, ans=0.125 2024-08-15 11:25:24,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158970.0, ans=0.1 2024-08-15 11:25:28,921 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 11:25:46,000 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 11:25:57,945 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11600, loss[loss=0.08965, beats_loss=0.01076, ecapa_loss=0.0001152, whisper_loss=0.07775, over 17387.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001516, whisper_loss=0.09017, over 3916044.20 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:25:59,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3159270.0, ans=0.0 2024-08-15 11:26:26,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-08-15 11:26:59,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3159570.0, ans=0.0 2024-08-15 11:27:03,653 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 11:27:16,682 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11650, loss[loss=0.09424, beats_loss=0.008749, ecapa_loss=0.0001724, whisper_loss=0.08376, over 14108.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.09004, over 3937456.44 frames. ], batch size: 57, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:27:19,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.447e+01 2.693e+01 2.991e+01 1.020e+02, threshold=5.386e+01, percent-clipped=2.0 2024-08-15 11:27:22,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2024-08-15 11:27:26,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3159770.0, ans=0.1 2024-08-15 11:27:44,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3159870.0, ans=0.0 2024-08-15 11:28:02,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3159970.0, ans=0.0 2024-08-15 11:28:05,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3160070.0, ans=10.0 2024-08-15 11:28:05,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3160070.0, ans=0.125 2024-08-15 11:28:11,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3160070.0, ans=0.125 2024-08-15 11:28:26,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.55 vs. limit=10.0 2024-08-15 11:28:32,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3160170.0, ans=0.125 2024-08-15 11:28:34,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11700, loss[loss=0.1028, beats_loss=0.01111, ecapa_loss=0.000151, whisper_loss=0.0902, over 13431.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01072, ecapa_loss=0.0001498, whisper_loss=0.09029, over 3916495.86 frames. ], batch size: 54, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:28:38,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-15 11:28:48,149 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 11:28:58,667 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 11:29:17,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3160570.0, ans=0.125 2024-08-15 11:29:22,465 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 11:29:36,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2024-08-15 11:29:45,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3160670.0, ans=0.0 2024-08-15 11:29:48,601 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11750, loss[loss=0.1137, beats_loss=0.0112, ecapa_loss=0.0001668, whisper_loss=0.1008, over 22381.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01083, ecapa_loss=0.0001493, whisper_loss=0.09039, over 3915220.11 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:29:52,017 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.446e+01 2.685e+01 3.012e+01 3.635e+02, threshold=5.370e+01, percent-clipped=2.0 2024-08-15 11:30:07,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2024-08-15 11:30:14,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3160870.0, ans=0.95 2024-08-15 11:30:22,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3160970.0, ans=0.0 2024-08-15 11:30:25,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3160970.0, ans=0.125 2024-08-15 11:30:31,253 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-15 11:30:37,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161070.0, ans=0.1 2024-08-15 11:30:53,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-08-15 11:30:54,327 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 11:30:55,854 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 11:31:03,134 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11800, loss[loss=0.1186, beats_loss=0.008883, ecapa_loss=0.0001342, whisper_loss=0.1084, over 22875.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001496, whisper_loss=0.09102, over 3901428.41 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:31:03,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161270.0, ans=0.1 2024-08-15 11:31:04,643 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 11:31:25,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.49 vs. limit=12.0 2024-08-15 11:31:33,760 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 27 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 11:31:38,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161470.0, ans=0.1 2024-08-15 11:31:40,144 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-15 11:31:42,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3161470.0, ans=0.0 2024-08-15 11:31:49,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3161570.0, ans=0.125 2024-08-15 11:31:54,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161570.0, ans=0.1 2024-08-15 11:32:00,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3161670.0, ans=0.125 2024-08-15 11:32:07,085 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-15 11:32:15,561 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11850, loss[loss=0.09928, beats_loss=0.01329, ecapa_loss=9.675e-05, whisper_loss=0.08502, over 18991.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001498, whisper_loss=0.09098, over 3922187.95 frames. ], batch size: 71, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:32:17,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.442e+01 2.720e+01 2.983e+01 2.168e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 11:32:25,409 WARNING [optim.py:496] (2/4) Scaling gradients by 0.029826095327734947, model_norm_threshold=54.40060806274414 2024-08-15 11:32:25,602 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.872e+05, grad_sumsq=7.654e+04, orig_rms_sq=8.977e+00 2024-08-15 11:32:27,489 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 11:32:32,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-15 11:32:36,249 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 11:32:42,382 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 11:32:48,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3161970.0, ans=0.0 2024-08-15 11:32:48,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.50 vs. limit=10.0 2024-08-15 11:33:07,025 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 20 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-15 11:33:17,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3162170.0, ans=0.05 2024-08-15 11:33:20,455 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 11:33:29,018 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11900, loss[loss=0.09814, beats_loss=0.009926, ecapa_loss=0.0001917, whisper_loss=0.0863, over 17445.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001508, whisper_loss=0.09143, over 3946201.51 frames. ], batch size: 73, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:33:31,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3162270.0, ans=0.125 2024-08-15 11:33:35,665 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 11:33:36,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-08-15 11:33:38,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3162270.0, ans=0.2 2024-08-15 11:33:46,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-15 11:33:49,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3162370.0, ans=0.0 2024-08-15 11:34:01,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3162470.0, ans=0.05 2024-08-15 11:34:02,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3162470.0, ans=0.125 2024-08-15 11:34:05,831 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 11:34:17,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3162570.0, ans=0.0 2024-08-15 11:34:43,953 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 11950, loss[loss=0.09297, beats_loss=0.01148, ecapa_loss=0.0001076, whisper_loss=0.08042, over 16103.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001513, whisper_loss=0.09118, over 3922191.09 frames. ], batch size: 62, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:34:46,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.257e+01 2.477e+01 2.736e+01 1.824e+03, threshold=4.954e+01, percent-clipped=1.0 2024-08-15 11:34:54,445 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 11:34:58,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2024-08-15 11:35:05,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3162870.0, ans=0.2 2024-08-15 11:35:06,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3162870.0, ans=0.125 2024-08-15 11:35:07,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3162870.0, ans=0.0 2024-08-15 11:35:16,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3162970.0, ans=0.1 2024-08-15 11:35:19,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3162970.0, ans=0.05 2024-08-15 11:35:24,881 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 11:35:31,342 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.044e+00 2024-08-15 11:35:34,167 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 11:35:53,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3163170.0, ans=0.125 2024-08-15 11:35:57,238 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12000, loss[loss=0.1236, beats_loss=0.009687, ecapa_loss=0.0001495, whisper_loss=0.1124, over 23007.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001506, whisper_loss=0.09064, over 3921445.73 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:35:57,239 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 11:36:35,738 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005396, whisper_loss=0.2462, over 922467.00 frames. 2024-08-15 11:36:56,041 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on SV_voxceleb1: loss=0.004196, beats_loss=0, ecapa_loss=0.0004196, whisper_loss=0, over 939242.00 frames. 2024-08-15 11:38:51,451 INFO [train_multi_KD3.py:1149] (2/4) Epoch 22, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 11:38:51,456 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 11:38:51,627 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-15 11:39:01,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163270.0, ans=0.1 2024-08-15 11:39:06,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-15 11:39:07,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3163370.0, ans=0.125 2024-08-15 11:39:07,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3163370.0, ans=0.125 2024-08-15 11:39:30,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=12.0 2024-08-15 11:39:37,483 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 11:40:04,847 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12050, loss[loss=0.09508, beats_loss=0.009873, ecapa_loss=0.0001534, whisper_loss=0.08368, over 22319.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001499, whisper_loss=0.09089, over 3922616.02 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:40:07,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.427e+01 2.583e+01 3.021e+01 1.024e+02, threshold=5.165e+01, percent-clipped=2.0 2024-08-15 11:40:18,454 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 13 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 11:40:35,885 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.18 vs. limit=6.0 2024-08-15 11:40:41,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3163970.0, ans=0.0 2024-08-15 11:40:54,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3164070.0, ans=0.125 2024-08-15 11:41:11,689 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 11:41:19,147 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12100, loss[loss=0.09342, beats_loss=0.01118, ecapa_loss=0.0001625, whisper_loss=0.08062, over 21947.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001496, whisper_loss=0.09079, over 3894962.25 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:41:32,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3164370.0, ans=0.07 2024-08-15 11:41:36,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3164370.0, ans=0.1 2024-08-15 11:41:46,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3164470.0, ans=0.0 2024-08-15 11:41:47,923 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:42:02,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3164570.0, ans=0.125 2024-08-15 11:42:23,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3164670.0, ans=0.125 2024-08-15 11:42:26,009 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 11:42:31,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12150, loss[loss=0.09617, beats_loss=0.01185, ecapa_loss=0.0001268, whisper_loss=0.08305, over 23894.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001509, whisper_loss=0.09067, over 3887975.49 frames. ], batch size: 95, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:42:34,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.205e+01 2.453e+01 2.798e+01 9.875e+01, threshold=4.907e+01, percent-clipped=1.0 2024-08-15 11:42:52,019 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 27 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 11:42:56,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3164870.0, ans=0.0 2024-08-15 11:42:59,090 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 11:43:07,607 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 11:43:39,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-15 11:43:46,704 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12200, loss[loss=0.07454, beats_loss=0.01117, ecapa_loss=0.0001911, whisper_loss=0.06146, over 15930.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001503, whisper_loss=0.09021, over 3853365.98 frames. ], batch size: 69, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:43:50,016 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 11:44:13,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3165370.0, ans=0.0 2024-08-15 11:44:19,036 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 11:44:22,326 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 11:44:26,851 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 11:44:38,838 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 11:44:42,950 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 11:45:01,764 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12250, loss[loss=0.1034, beats_loss=0.008738, ecapa_loss=0.0002007, whisper_loss=0.09265, over 21546.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001505, whisper_loss=0.09023, over 3845577.53 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:45:01,955 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 11:45:04,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.418e+01 2.740e+01 3.244e+01 5.356e+01, threshold=5.480e+01, percent-clipped=1.0 2024-08-15 11:45:08,152 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 11:45:20,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3165870.0, ans=0.125 2024-08-15 11:45:29,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3165870.0, ans=0.07 2024-08-15 11:45:37,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.58 vs. limit=6.0 2024-08-15 11:46:07,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3166170.0, ans=0.125 2024-08-15 11:46:09,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3166170.0, ans=0.0 2024-08-15 11:46:14,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-15 11:46:15,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=12.0 2024-08-15 11:46:16,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12300, loss[loss=0.1243, beats_loss=0.00871, ecapa_loss=0.000179, whisper_loss=0.1138, over 16216.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001513, whisper_loss=0.09055, over 3842547.83 frames. ], batch size: 64, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:46:17,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3166270.0, ans=0.125 2024-08-15 11:46:17,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3166270.0, ans=0.125 2024-08-15 11:46:17,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3166270.0, ans=0.125 2024-08-15 11:46:28,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3166270.0, ans=0.0 2024-08-15 11:46:32,578 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 11:46:41,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3166370.0, ans=0.2 2024-08-15 11:46:48,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-08-15 11:46:52,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3166470.0, ans=0.05 2024-08-15 11:46:57,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3166470.0, ans=0.125 2024-08-15 11:47:02,263 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 11:47:03,628 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 11:47:13,185 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 11:47:29,293 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12350, loss[loss=0.09931, beats_loss=0.01068, ecapa_loss=0.0001335, whisper_loss=0.08729, over 16552.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001504, whisper_loss=0.09066, over 3868294.61 frames. ], batch size: 66, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:47:32,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.362e+01 2.585e+01 2.912e+01 4.342e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 11:47:37,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3166770.0, ans=0.125 2024-08-15 11:47:44,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3166870.0, ans=0.125 2024-08-15 11:48:01,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3166970.0, ans=0.125 2024-08-15 11:48:08,656 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 11:48:14,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3167070.0, ans=0.125 2024-08-15 11:48:20,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3167070.0, ans=0.05 2024-08-15 11:48:39,482 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 11:48:43,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12400, loss[loss=0.07899, beats_loss=0.01273, ecapa_loss=0.0001417, whisper_loss=0.06484, over 16176.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001511, whisper_loss=0.09113, over 3878825.33 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:48:56,515 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 11:49:09,338 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 11:49:17,058 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 11:49:22,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3167470.0, ans=0.125 2024-08-15 11:49:24,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3167470.0, ans=0.2 2024-08-15 11:49:28,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3167570.0, ans=0.2 2024-08-15 11:49:33,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3167570.0, ans=0.0 2024-08-15 11:49:34,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2024-08-15 11:49:58,067 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12450, loss[loss=0.1149, beats_loss=0.009622, ecapa_loss=0.0001914, whisper_loss=0.1033, over 20051.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.000151, whisper_loss=0.09071, over 3880177.73 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:49:58,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3167770.0, ans=0.0 2024-08-15 11:50:01,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.281e+01 2.553e+01 2.853e+01 4.118e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-15 11:50:28,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3167970.0, ans=0.125 2024-08-15 11:50:32,559 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 12 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 11:50:42,637 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 11:50:50,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3168070.0, ans=0.125 2024-08-15 11:50:54,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3168070.0, ans=0.125 2024-08-15 11:51:09,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3168170.0, ans=0.125 2024-08-15 11:51:11,611 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12500, loss[loss=0.09767, beats_loss=0.009839, ecapa_loss=0.0001523, whisper_loss=0.0863, over 22369.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001499, whisper_loss=0.09058, over 3850889.58 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:51:21,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3168270.0, ans=15.0 2024-08-15 11:51:22,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3168270.0, ans=0.125 2024-08-15 11:51:56,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3168570.0, ans=0.125 2024-08-15 11:52:04,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3168570.0, ans=0.125 2024-08-15 11:52:08,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3168570.0, ans=0.0 2024-08-15 11:52:11,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3168670.0, ans=0.1 2024-08-15 11:52:26,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12550, loss[loss=0.1062, beats_loss=0.01007, ecapa_loss=0.0001466, whisper_loss=0.09471, over 22717.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001496, whisper_loss=0.09083, over 3855780.38 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:52:29,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.452e+01 2.757e+01 2.941e+01 1.392e+02, threshold=5.513e+01, percent-clipped=1.0 2024-08-15 11:53:19,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3169070.0, ans=0.125 2024-08-15 11:53:21,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3169070.0, ans=0.0 2024-08-15 11:53:23,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3169070.0, ans=0.125 2024-08-15 11:53:28,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3169170.0, ans=0.0 2024-08-15 11:53:30,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3169170.0, ans=0.2 2024-08-15 11:53:35,320 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 23 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-15 11:53:37,209 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 11:53:39,980 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 11:53:40,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12600, loss[loss=0.1302, beats_loss=0.008137, ecapa_loss=0.0001697, whisper_loss=0.1204, over 21538.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001499, whisper_loss=0.09132, over 3878165.46 frames. ], batch size: 85, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:53:41,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3169270.0, ans=0.07 2024-08-15 11:53:46,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3169270.0, ans=0.125 2024-08-15 11:54:03,493 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 11:54:10,468 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 15 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 11:54:15,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-15 11:54:24,447 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 11:54:32,352 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-15 11:54:33,879 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 11:54:38,136 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-15 11:54:38,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.53 vs. limit=22.5 2024-08-15 11:54:52,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3169670.0, ans=0.125 2024-08-15 11:54:53,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3169670.0, ans=0.07 2024-08-15 11:54:55,998 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12650, loss[loss=0.07725, beats_loss=0.01043, ecapa_loss=0.0001848, whisper_loss=0.06497, over 14975.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.09083, over 3870647.80 frames. ], batch size: 63, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:54:56,294 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 11:54:58,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.444e+01 2.686e+01 2.954e+01 5.186e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-15 11:55:12,210 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 11:55:37,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3169970.0, ans=0.0 2024-08-15 11:55:51,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3170070.0, ans=0.2 2024-08-15 11:56:01,300 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 11:56:09,273 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12700, loss[loss=0.09239, beats_loss=0.01245, ecapa_loss=0.0001351, whisper_loss=0.07858, over 17700.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001508, whisper_loss=0.09047, over 3858579.21 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:56:42,644 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 11:56:48,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3170470.0, ans=0.125 2024-08-15 11:56:50,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3170470.0, ans=0.125 2024-08-15 11:56:53,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-15 11:57:18,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3170670.0, ans=0.2 2024-08-15 11:57:20,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-08-15 11:57:22,272 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12750, loss[loss=0.09637, beats_loss=0.008012, ecapa_loss=0.0001886, whisper_loss=0.08647, over 14643.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001507, whisper_loss=0.09044, over 3863678.52 frames. ], batch size: 58, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:57:25,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.276e+01 2.433e+01 2.763e+01 4.017e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 11:58:08,507 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-15 11:58:30,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3171170.0, ans=0.025 2024-08-15 11:58:35,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3171170.0, ans=0.04949747468305833 2024-08-15 11:58:39,824 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12800, loss[loss=0.08438, beats_loss=0.01051, ecapa_loss=0.0001453, whisper_loss=0.07241, over 13389.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001516, whisper_loss=0.08999, over 3842583.04 frames. ], batch size: 55, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:58:52,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3171270.0, ans=0.125 2024-08-15 11:59:06,770 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 11:59:17,332 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 11:59:19,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3171470.0, ans=0.125 2024-08-15 11:59:43,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3171670.0, ans=0.07 2024-08-15 11:59:47,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3171670.0, ans=0.0 2024-08-15 11:59:54,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12850, loss[loss=0.09551, beats_loss=0.01181, ecapa_loss=0.0001771, whisper_loss=0.08193, over 22210.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01074, ecapa_loss=0.0001513, whisper_loss=0.08948, over 3826565.02 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:59:57,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.257e+01 2.519e+01 2.816e+01 4.550e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-15 12:00:12,627 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 12:00:14,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3171870.0, ans=0.0 2024-08-15 12:00:18,536 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 12:00:23,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3171970.0, ans=0.125 2024-08-15 12:00:37,620 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 12:00:38,003 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:00:39,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3172070.0, ans=0.2 2024-08-15 12:00:54,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3172170.0, ans=0.125 2024-08-15 12:01:01,250 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 12:01:02,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3172170.0, ans=0.2 2024-08-15 12:01:08,158 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12900, loss[loss=0.139, beats_loss=0.006739, ecapa_loss=0.0001825, whisper_loss=0.1304, over 23138.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01065, ecapa_loss=0.0001513, whisper_loss=0.08926, over 3801806.90 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:01:15,577 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 12:01:33,195 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 12:01:36,557 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 12:01:41,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3172470.0, ans=0.1 2024-08-15 12:01:42,423 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 21 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-15 12:01:46,733 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 12:01:49,654 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 12:01:50,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3172470.0, ans=0.1 2024-08-15 12:01:54,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3172570.0, ans=0.2 2024-08-15 12:02:15,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3172670.0, ans=0.0 2024-08-15 12:02:20,741 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 12 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 12:02:21,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3172770.0, ans=0.2 2024-08-15 12:02:21,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 12950, loss[loss=0.05978, beats_loss=0.01262, ecapa_loss=0.0001482, whisper_loss=0.04568, over 15380.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001506, whisper_loss=0.08932, over 3822473.96 frames. ], batch size: 64, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:02:24,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.275e+01 2.546e+01 2.873e+01 4.108e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-15 12:02:45,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3172870.0, ans=0.125 2024-08-15 12:02:50,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3172870.0, ans=0.125 2024-08-15 12:02:53,071 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 12:03:00,474 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-15 12:03:12,061 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 12:03:12,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173070.0, ans=0.1 2024-08-15 12:03:14,316 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-15 12:03:24,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3173170.0, ans=0.125 2024-08-15 12:03:30,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3173170.0, ans=0.025 2024-08-15 12:03:37,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13000, loss[loss=0.1235, beats_loss=0.007803, ecapa_loss=0.0001463, whisper_loss=0.1142, over 15485.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001501, whisper_loss=0.08977, over 3849488.01 frames. ], batch size: 59, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:03:44,012 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-15 12:03:51,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3173370.0, ans=0.125 2024-08-15 12:03:56,991 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-15 12:04:03,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3173370.0, ans=0.125 2024-08-15 12:04:06,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-15 12:04:07,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3173470.0, ans=0.125 2024-08-15 12:04:10,544 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 32 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 12:04:20,484 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0589878149330616, model_norm_threshold=50.92251968383789 2024-08-15 12:04:20,654 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.49, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.672e+05, grad_sumsq=3.654e+07, orig_rms_sq=1.005e-02 2024-08-15 12:04:22,291 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 12:04:30,340 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 12:04:31,738 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 12:04:35,905 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 12:04:41,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-15 12:04:48,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3173670.0, ans=0.0 2024-08-15 12:04:51,886 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13050, loss[loss=0.08553, beats_loss=0.01167, ecapa_loss=0.0001683, whisper_loss=0.07219, over 18632.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.000151, whisper_loss=0.09013, over 3830485.49 frames. ], batch size: 79, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:04:54,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2024-08-15 12:04:54,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.413e+01 2.554e+01 2.771e+01 8.633e+02, threshold=5.107e+01, percent-clipped=2.0 2024-08-15 12:05:01,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3173770.0, ans=0.125 2024-08-15 12:05:03,631 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 12:05:08,138 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 12:05:18,846 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 12:05:25,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3173970.0, ans=0.125 2024-08-15 12:05:28,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2024-08-15 12:05:29,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3173970.0, ans=0.1 2024-08-15 12:05:37,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3174070.0, ans=10.0 2024-08-15 12:05:40,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3174070.0, ans=0.2 2024-08-15 12:05:41,614 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 22 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-15 12:06:06,672 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13100, loss[loss=0.1056, beats_loss=0.01012, ecapa_loss=0.000141, whisper_loss=0.09407, over 22103.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001509, whisper_loss=0.0906, over 3814860.98 frames. ], batch size: 88, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:06:20,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2024-08-15 12:06:38,629 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 12:06:53,200 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 12:06:56,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3174570.0, ans=0.1 2024-08-15 12:07:05,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.25 vs. limit=6.0 2024-08-15 12:07:14,045 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.722e+05 2024-08-15 12:07:20,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13150, loss[loss=0.1281, beats_loss=0.007749, ecapa_loss=0.0001554, whisper_loss=0.1188, over 16050.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001499, whisper_loss=0.09047, over 3842191.93 frames. ], batch size: 62, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:07:21,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3174770.0, ans=0.125 2024-08-15 12:07:21,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3174770.0, ans=0.1 2024-08-15 12:07:23,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.322e+01 2.580e+01 2.894e+01 4.254e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 12:07:31,745 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 14 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 12:07:38,633 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 12:07:45,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-15 12:07:52,106 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 12:07:54,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3174970.0, ans=0.04949747468305833 2024-08-15 12:07:58,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3174970.0, ans=0.125 2024-08-15 12:08:15,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3175070.0, ans=0.0 2024-08-15 12:08:23,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=12.0 2024-08-15 12:08:32,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-08-15 12:08:34,119 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13200, loss[loss=0.07763, beats_loss=0.01153, ecapa_loss=0.0001697, whisper_loss=0.0644, over 21158.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001491, whisper_loss=0.0904, over 3838156.02 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:08:44,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2024-08-15 12:09:06,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2024-08-15 12:09:10,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3175470.0, ans=0.1 2024-08-15 12:09:23,286 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 12:09:24,579 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 12:09:27,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3175570.0, ans=0.0 2024-08-15 12:09:29,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-15 12:09:39,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3175670.0, ans=0.015 2024-08-15 12:09:40,977 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-15 12:09:50,271 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13250, loss[loss=0.08662, beats_loss=0.01341, ecapa_loss=0.0001253, whisper_loss=0.07196, over 19295.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001499, whisper_loss=0.09007, over 3839374.76 frames. ], batch size: 77, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:09:53,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.280e+01 2.540e+01 2.785e+01 5.121e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-15 12:10:31,077 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 12:10:33,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3175970.0, ans=0.0 2024-08-15 12:11:05,754 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13300, loss[loss=0.1002, beats_loss=0.01274, ecapa_loss=0.0001577, whisper_loss=0.08586, over 20666.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001508, whisper_loss=0.08993, over 3833050.07 frames. ], batch size: 85, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:11:09,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3176270.0, ans=0.0 2024-08-15 12:11:10,197 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 12:11:12,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3176270.0, ans=0.125 2024-08-15 12:11:13,257 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-15 12:11:38,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2024-08-15 12:11:54,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3176570.0, ans=0.125 2024-08-15 12:11:55,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3176570.0, ans=0.125 2024-08-15 12:12:01,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3176570.0, ans=0.125 2024-08-15 12:12:18,523 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13350, loss[loss=0.1307, beats_loss=0.007032, ecapa_loss=0.0001761, whisper_loss=0.1219, over 19244.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.00015, whisper_loss=0.09024, over 3857005.26 frames. ], batch size: 75, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:12:20,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3176770.0, ans=0.0 2024-08-15 12:12:21,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.329e+01 2.607e+01 2.940e+01 2.592e+02, threshold=5.213e+01, percent-clipped=3.0 2024-08-15 12:12:37,392 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 12:12:39,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.98 vs. limit=5.0 2024-08-15 12:12:44,482 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 12:12:58,732 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 12:13:01,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3177070.0, ans=0.125 2024-08-15 12:13:32,337 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13400, loss[loss=0.1092, beats_loss=0.01185, ecapa_loss=0.0001235, whisper_loss=0.09615, over 23044.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001496, whisper_loss=0.08973, over 3873611.54 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:13:34,164 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 12:13:39,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3177270.0, ans=0.1 2024-08-15 12:13:46,295 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 12:13:53,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3177370.0, ans=10.0 2024-08-15 12:14:23,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3177570.0, ans=0.125 2024-08-15 12:14:26,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3177570.0, ans=0.125 2024-08-15 12:14:38,452 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-15 12:14:45,826 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13450, loss[loss=0.1131, beats_loss=0.01098, ecapa_loss=0.0001685, whisper_loss=0.1004, over 20891.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001504, whisper_loss=0.09031, over 3887390.60 frames. ], batch size: 88, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:14:48,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.536e+01 2.666e+01 2.899e+01 1.016e+02, threshold=5.331e+01, percent-clipped=2.0 2024-08-15 12:14:50,344 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 12:14:56,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3177770.0, ans=0.0 2024-08-15 12:14:57,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3177770.0, ans=0.125 2024-08-15 12:15:02,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3177870.0, ans=0.1 2024-08-15 12:15:03,495 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 12:15:16,912 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 27 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 12:15:25,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2024-08-15 12:15:28,013 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 12:15:31,248 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 12:15:40,497 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 12:15:45,057 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:15:58,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3178170.0, ans=0.0 2024-08-15 12:16:00,551 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13500, loss[loss=0.1072, beats_loss=0.01043, ecapa_loss=0.0001621, whisper_loss=0.09513, over 22649.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001502, whisper_loss=0.09099, over 3895735.96 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:16:10,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3178270.0, ans=0.125 2024-08-15 12:16:17,292 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 12:16:17,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3178370.0, ans=0.125 2024-08-15 12:16:32,665 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 12:16:41,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3178470.0, ans=0.1 2024-08-15 12:16:47,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3178570.0, ans=0.125 2024-08-15 12:16:53,268 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 12:16:54,655 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 22 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-15 12:17:14,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13550, loss[loss=0.1134, beats_loss=0.009586, ecapa_loss=0.000187, whisper_loss=0.1019, over 19444.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001507, whisper_loss=0.0914, over 3883562.74 frames. ], batch size: 81, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:17:17,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.308e+01 2.563e+01 2.825e+01 4.152e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-15 12:17:22,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3178770.0, ans=0.09899494936611666 2024-08-15 12:17:26,107 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=22.5 2024-08-15 12:17:28,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3178870.0, ans=0.0 2024-08-15 12:17:45,462 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 12:17:50,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3178970.0, ans=0.025 2024-08-15 12:18:28,354 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13600, loss[loss=0.1052, beats_loss=0.01074, ecapa_loss=0.0001515, whisper_loss=0.09296, over 22446.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001494, whisper_loss=0.09123, over 3884464.52 frames. ], batch size: 87, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:18:29,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2024-08-15 12:18:51,991 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 10 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 12:18:59,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2024-08-15 12:19:02,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.31 vs. limit=6.0 2024-08-15 12:19:26,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.61 vs. limit=10.0 2024-08-15 12:19:41,948 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13650, loss[loss=0.09363, beats_loss=0.01367, ecapa_loss=0.0001087, whisper_loss=0.07887, over 23259.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001489, whisper_loss=0.09038, over 3895260.80 frames. ], batch size: 92, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:19:44,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.314e+01 2.512e+01 2.853e+01 1.013e+02, threshold=5.025e+01, percent-clipped=2.0 2024-08-15 12:20:34,443 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 12:20:40,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3180170.0, ans=0.125 2024-08-15 12:20:41,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=15.0 2024-08-15 12:20:42,991 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.17 vs. limit=10.0 2024-08-15 12:20:55,468 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13700, loss[loss=0.1118, beats_loss=0.01039, ecapa_loss=0.0001654, whisper_loss=0.09979, over 21968.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001486, whisper_loss=0.09053, over 3913670.39 frames. ], batch size: 88, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:20:59,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3180270.0, ans=0.0 2024-08-15 12:21:12,780 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 25 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 12:21:17,204 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 12:21:22,701 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-15 12:22:09,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-15 12:22:11,420 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13750, loss[loss=0.1119, beats_loss=0.009877, ecapa_loss=0.0001546, whisper_loss=0.1004, over 17971.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001483, whisper_loss=0.09043, over 3867925.24 frames. ], batch size: 74, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:22:13,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3180770.0, ans=0.125 2024-08-15 12:22:14,195 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.530e+01 2.885e+01 4.854e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 12:22:28,159 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 12:22:38,629 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 12:22:42,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3180970.0, ans=0.2 2024-08-15 12:22:44,883 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 12:23:00,088 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:23:10,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3181170.0, ans=0.125 2024-08-15 12:23:25,780 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13800, loss[loss=0.12, beats_loss=0.008513, ecapa_loss=0.0001589, whisper_loss=0.1099, over 15385.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01086, ecapa_loss=0.0001486, whisper_loss=0.08958, over 3854384.49 frames. ], batch size: 55, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:23:53,942 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 12:24:07,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3181470.0, ans=0.0 2024-08-15 12:24:17,803 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 12:24:18,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3181570.0, ans=0.125 2024-08-15 12:24:19,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3181570.0, ans=0.09899494936611666 2024-08-15 12:24:19,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3181570.0, ans=0.125 2024-08-15 12:24:24,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3181670.0, ans=0.05 2024-08-15 12:24:25,913 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.372e+00 2024-08-15 12:24:36,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3181670.0, ans=0.125 2024-08-15 12:24:40,095 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13850, loss[loss=0.1194, beats_loss=0.008297, ecapa_loss=0.0001791, whisper_loss=0.1093, over 19057.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01085, ecapa_loss=0.000148, whisper_loss=0.09009, over 3876782.00 frames. ], batch size: 78, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:24:43,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.369e+01 2.668e+01 2.994e+01 7.332e+01, threshold=5.336e+01, percent-clipped=2.0 2024-08-15 12:24:43,642 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:25:11,010 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-15 12:25:36,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-15 12:25:41,677 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 12:25:49,375 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 12:25:52,362 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 12:25:53,658 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13900, loss[loss=0.08317, beats_loss=0.01355, ecapa_loss=0.0001407, whisper_loss=0.06821, over 17832.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001486, whisper_loss=0.09052, over 3890843.31 frames. ], batch size: 74, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:25:55,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3182270.0, ans=0.1 2024-08-15 12:26:11,608 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 12:26:41,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=12.0 2024-08-15 12:26:43,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3182570.0, ans=0.125 2024-08-15 12:26:47,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-15 12:26:58,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3182670.0, ans=0.0 2024-08-15 12:27:06,731 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 13950, loss[loss=0.1001, beats_loss=0.01199, ecapa_loss=0.0001479, whisper_loss=0.08662, over 21708.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01065, ecapa_loss=0.0001497, whisper_loss=0.09171, over 3904268.09 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:27:09,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.273e+01 2.481e+01 2.745e+01 4.473e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-15 12:27:14,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3182770.0, ans=0.0 2024-08-15 12:27:19,555 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 12:27:28,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-15 12:27:49,447 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 12:27:58,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3183070.0, ans=0.125 2024-08-15 12:28:00,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3183070.0, ans=0.125 2024-08-15 12:28:01,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3183070.0, ans=0.125 2024-08-15 12:28:06,869 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-15 12:28:10,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-15 12:28:20,292 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14000, loss[loss=0.09267, beats_loss=0.01194, ecapa_loss=0.0001605, whisper_loss=0.07913, over 16828.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.000149, whisper_loss=0.09112, over 3887261.28 frames. ], batch size: 73, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:28:20,521 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 39 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-15 12:28:21,977 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 21 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 12:28:39,800 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 26 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-15 12:29:12,331 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 12:29:18,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3183670.0, ans=0.2 2024-08-15 12:29:34,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14050, loss[loss=0.1257, beats_loss=0.008527, ecapa_loss=0.0001477, whisper_loss=0.1157, over 23463.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.09161, over 3862338.39 frames. ], batch size: 87, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:29:37,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.178e+01 2.428e+01 2.740e+01 4.100e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-15 12:29:39,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3183770.0, ans=0.0 2024-08-15 12:30:05,185 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 12:30:07,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-08-15 12:30:12,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183970.0, ans=0.1 2024-08-15 12:30:22,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3184070.0, ans=0.04949747468305833 2024-08-15 12:30:36,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-15 12:30:50,218 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14100, loss[loss=0.09375, beats_loss=0.01302, ecapa_loss=0.0001544, whisper_loss=0.07918, over 22342.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001486, whisper_loss=0.09116, over 3839047.70 frames. ], batch size: 93, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:31:09,545 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.677e+01 2024-08-15 12:31:10,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3184370.0, ans=0.125 2024-08-15 12:31:30,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-15 12:31:30,239 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-15 12:31:41,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3184570.0, ans=0.2 2024-08-15 12:31:53,748 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 12:32:03,064 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14150, loss[loss=0.1469, beats_loss=0.007479, ecapa_loss=0.0001347, whisper_loss=0.1381, over 18035.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001495, whisper_loss=0.0913, over 3838132.34 frames. ], batch size: 64, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:32:06,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.412e+01 2.567e+01 2.890e+01 3.775e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-15 12:32:18,928 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 12 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 12:32:28,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3184870.0, ans=0.125 2024-08-15 12:32:43,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3184970.0, ans=0.125 2024-08-15 12:32:46,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3184970.0, ans=0.0 2024-08-15 12:33:03,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-08-15 12:33:09,296 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 12:33:22,168 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14200, loss[loss=0.1151, beats_loss=0.007117, ecapa_loss=0.0001673, whisper_loss=0.1063, over 13761.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001497, whisper_loss=0.09014, over 3855785.79 frames. ], batch size: 55, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:33:31,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3185270.0, ans=0.2 2024-08-15 12:33:33,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-08-15 12:33:47,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3185370.0, ans=0.1 2024-08-15 12:33:47,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-15 12:34:07,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3185570.0, ans=0.2 2024-08-15 12:34:28,335 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 36 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 12:34:37,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3185670.0, ans=0.1 2024-08-15 12:34:44,385 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14250, loss[loss=0.09115, beats_loss=0.01233, ecapa_loss=0.0001316, whisper_loss=0.07751, over 23586.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001496, whisper_loss=0.09017, over 3873827.46 frames. ], batch size: 95, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:34:49,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.313e+01 2.543e+01 2.810e+01 4.306e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-15 12:35:08,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3185870.0, ans=0.0 2024-08-15 12:35:25,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3185970.0, ans=0.0 2024-08-15 12:35:27,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3185970.0, ans=0.125 2024-08-15 12:35:58,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3186070.0, ans=0.1 2024-08-15 12:36:10,480 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 12:36:18,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3186170.0, ans=0.125 2024-08-15 12:36:23,861 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14300, loss[loss=0.1104, beats_loss=0.009628, ecapa_loss=0.0001321, whisper_loss=0.09944, over 19449.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001495, whisper_loss=0.08996, over 3865186.07 frames. ], batch size: 74, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:36:32,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3186270.0, ans=0.125 2024-08-15 12:36:46,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3186370.0, ans=0.125 2024-08-15 12:36:46,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3186370.0, ans=0.1 2024-08-15 12:36:52,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3186370.0, ans=0.0 2024-08-15 12:36:57,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186370.0, ans=0.1 2024-08-15 12:36:59,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3186370.0, ans=0.2 2024-08-15 12:37:09,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3186470.0, ans=0.125 2024-08-15 12:37:14,947 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 23 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-15 12:37:18,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186470.0, ans=0.125 2024-08-15 12:37:39,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3186570.0, ans=0.125 2024-08-15 12:37:43,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3186670.0, ans=0.2 2024-08-15 12:37:48,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3186670.0, ans=0.2 2024-08-15 12:37:49,660 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-15 12:37:51,805 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 12:37:56,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3186670.0, ans=0.125 2024-08-15 12:37:56,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-15 12:38:03,702 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14350, loss[loss=0.09859, beats_loss=0.0118, ecapa_loss=0.0001294, whisper_loss=0.08549, over 22613.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001495, whisper_loss=0.09003, over 3850859.66 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:38:09,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.292e+01 2.515e+01 2.764e+01 5.097e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 12:38:16,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3186770.0, ans=0.125 2024-08-15 12:38:31,459 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 12:38:42,442 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-15 12:38:44,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3186970.0, ans=0.125 2024-08-15 12:39:02,304 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 12:39:02,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3187070.0, ans=0.0 2024-08-15 12:39:06,480 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 12:39:19,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3187070.0, ans=0.0 2024-08-15 12:39:43,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3187170.0, ans=0.0 2024-08-15 12:39:46,720 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14400, loss[loss=0.1105, beats_loss=0.009704, ecapa_loss=0.0001754, whisper_loss=0.09904, over 21395.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001497, whisper_loss=0.09045, over 3866568.12 frames. ], batch size: 88, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:40:07,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-15 12:40:11,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2024-08-15 12:40:20,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3187370.0, ans=0.125 2024-08-15 12:40:37,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3187570.0, ans=0.125 2024-08-15 12:40:44,408 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 12:40:51,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3187670.0, ans=0.1 2024-08-15 12:41:05,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3187770.0, ans=0.0 2024-08-15 12:41:06,235 INFO [train_multi_KD3.py:1116] (2/4) Epoch 22, batch 14450, loss[loss=0.1045, beats_loss=0.01055, ecapa_loss=0.0001707, whisper_loss=0.09225, over 20796.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.0909, over 3857327.16 frames. ], batch size: 86, lr: 2.80e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:41:12,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.363e+01 2.570e+01 2.963e+01 1.669e+02, threshold=5.140e+01, percent-clipped=2.0 2024-08-15 12:41:28,216 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 12:41:32,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2024-08-15 12:41:47,528 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:41:52,903 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 12:41:57,116 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 12:41:58,536 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 12:42:01,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3188070.0, ans=0.2 2024-08-15 12:42:04,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3188170.0, ans=0.125 2024-08-15 12:42:46,634 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 0, loss[loss=0.109, beats_loss=0.01076, ecapa_loss=0.0001536, whisper_loss=0.0967, over 23176.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01076, ecapa_loss=0.0001536, whisper_loss=0.0967, over 23176.00 frames. ], batch size: 92, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:42:46,635 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 12:43:28,450 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 12:43:45,291 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.00428, beats_loss=0, ecapa_loss=0.000428, whisper_loss=0, over 939242.00 frames. 2024-08-15 12:45:01,989 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9277, 2.3499, 2.4661, 3.0808], device='cuda:2') 2024-08-15 12:45:43,928 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 12:45:43,931 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 12:45:56,943 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 12:46:03,311 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 12:46:15,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3188320.0, ans=0.125 2024-08-15 12:46:26,628 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 12:47:02,020 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 12:47:09,579 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 12:47:15,036 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 22 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 12:47:15,315 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:47:22,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3188520.0, ans=0.125 2024-08-15 12:47:34,419 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 21 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 12:47:50,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2024-08-15 12:47:50,357 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 50, loss[loss=0.09134, beats_loss=0.01207, ecapa_loss=0.0001398, whisper_loss=0.07787, over 20467.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009625, ecapa_loss=0.0001545, whisper_loss=0.09128, over 902027.31 frames. ], batch size: 81, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:47:59,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-08-15 12:48:13,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.413e+01 2.733e+01 3.074e+01 3.899e+01, threshold=5.466e+01, percent-clipped=0.0 2024-08-15 12:48:33,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.29 vs. limit=6.0 2024-08-15 12:48:35,279 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 12:49:38,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3189120.0, ans=0.0 2024-08-15 12:49:40,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3189120.0, ans=0.125 2024-08-15 12:49:49,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 100, loss[loss=0.1117, beats_loss=0.008511, ecapa_loss=0.0001422, whisper_loss=0.1018, over 20571.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009514, ecapa_loss=0.0001517, whisper_loss=0.09009, over 1541848.20 frames. ], batch size: 80, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:50:07,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.37 vs. limit=22.5 2024-08-15 12:50:34,568 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 12:50:34,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3189420.0, ans=0.2 2024-08-15 12:50:50,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3189420.0, ans=0.125 2024-08-15 12:51:04,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3189520.0, ans=0.125 2024-08-15 12:51:21,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2024-08-15 12:51:28,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-15 12:51:41,281 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 150, loss[loss=0.08374, beats_loss=0.0118, ecapa_loss=0.0001713, whisper_loss=0.07023, over 18317.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009586, ecapa_loss=0.0001514, whisper_loss=0.08976, over 2050177.74 frames. ], batch size: 78, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:51:45,464 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 10 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 12:51:57,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.541e+01 2.794e+01 3.145e+01 4.567e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-15 12:52:02,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-15 12:52:17,516 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-15 12:52:19,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3189920.0, ans=0.0 2024-08-15 12:52:47,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3190020.0, ans=0.125 2024-08-15 12:52:55,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3190120.0, ans=0.125 2024-08-15 12:53:01,534 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 12:53:05,920 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 200, loss[loss=0.0851, beats_loss=0.01111, ecapa_loss=0.0001288, whisper_loss=0.0727, over 15064.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009737, ecapa_loss=0.0001509, whisper_loss=0.09006, over 2437363.59 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:53:12,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-15 12:53:22,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3190320.0, ans=0.125 2024-08-15 12:53:30,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190320.0, ans=0.1 2024-08-15 12:53:54,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3190520.0, ans=0.125 2024-08-15 12:53:58,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3190520.0, ans=0.125 2024-08-15 12:54:06,489 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 12:54:07,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=15.0 2024-08-15 12:54:15,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3190620.0, ans=0.125 2024-08-15 12:54:24,675 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 250, loss[loss=0.1127, beats_loss=0.008836, ecapa_loss=0.000117, whisper_loss=0.1027, over 22897.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009853, ecapa_loss=0.0001507, whisper_loss=0.09058, over 2726376.66 frames. ], batch size: 82, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:54:28,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3190720.0, ans=0.0 2024-08-15 12:54:38,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.264e+01 2.507e+01 2.916e+01 4.701e+01, threshold=5.014e+01, percent-clipped=0.0 2024-08-15 12:54:51,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190820.0, ans=0.1 2024-08-15 12:54:56,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3190920.0, ans=0.0 2024-08-15 12:54:56,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3190920.0, ans=0.0 2024-08-15 12:54:57,901 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:54:59,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3190920.0, ans=0.0 2024-08-15 12:55:10,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-15 12:55:30,330 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2024-08-15 12:55:35,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3191120.0, ans=0.0 2024-08-15 12:55:36,883 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 16 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 12:55:41,390 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 300, loss[loss=0.0772, beats_loss=0.01072, ecapa_loss=0.0001524, whisper_loss=0.06495, over 21002.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01002, ecapa_loss=0.000151, whisper_loss=0.08958, over 2949315.96 frames. ], batch size: 85, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:55:43,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3191220.0, ans=0.125 2024-08-15 12:55:44,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=12.0 2024-08-15 12:55:47,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2024-08-15 12:55:48,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3191220.0, ans=0.05 2024-08-15 12:55:53,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3191220.0, ans=0.125 2024-08-15 12:55:55,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3191220.0, ans=0.2 2024-08-15 12:56:01,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191320.0, ans=0.125 2024-08-15 12:56:06,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2024-08-15 12:56:10,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3191320.0, ans=0.2 2024-08-15 12:56:22,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-15 12:56:23,090 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 12:56:39,629 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 12:56:43,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3191620.0, ans=0.0 2024-08-15 12:56:55,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=12.0 2024-08-15 12:56:58,857 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 350, loss[loss=0.1254, beats_loss=0.007748, ecapa_loss=0.0001961, whisper_loss=0.1157, over 15269.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01015, ecapa_loss=0.000151, whisper_loss=0.09037, over 3154626.93 frames. ], batch size: 63, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:57:05,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3191720.0, ans=0.0 2024-08-15 12:57:12,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.524e+01 2.862e+01 4.157e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 12:57:21,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3191820.0, ans=0.2 2024-08-15 12:57:32,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3191920.0, ans=0.0 2024-08-15 12:57:35,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3191920.0, ans=0.025 2024-08-15 12:57:54,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3192020.0, ans=0.125 2024-08-15 12:58:16,048 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 400, loss[loss=0.08724, beats_loss=0.009827, ecapa_loss=0.0001505, whisper_loss=0.0759, over 18293.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001498, whisper_loss=0.08931, over 3314813.19 frames. ], batch size: 74, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:58:24,326 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 12:58:27,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3192220.0, ans=0.2 2024-08-15 12:58:35,663 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 12:58:39,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3192320.0, ans=0.125 2024-08-15 12:58:47,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192420.0, ans=0.1 2024-08-15 12:59:12,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3192520.0, ans=0.0 2024-08-15 12:59:13,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-15 12:59:14,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2024-08-15 12:59:19,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.76 vs. limit=22.5 2024-08-15 12:59:26,375 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:59:35,517 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 450, loss[loss=0.1112, beats_loss=0.01011, ecapa_loss=0.0001297, whisper_loss=0.09977, over 19191.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.000149, whisper_loss=0.08912, over 3449804.45 frames. ], batch size: 76, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:59:37,135 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 12:59:49,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.262e+01 2.454e+01 2.784e+01 4.737e+01, threshold=4.907e+01, percent-clipped=0.0 2024-08-15 12:59:57,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3192820.0, ans=0.1 2024-08-15 13:00:13,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-15 13:00:18,708 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 19 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-15 13:00:29,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-15 13:00:33,039 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.180e+05 2024-08-15 13:00:53,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3193120.0, ans=0.1 2024-08-15 13:00:58,583 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 500, loss[loss=0.1037, beats_loss=0.009603, ecapa_loss=0.000182, whisper_loss=0.09227, over 20659.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.0001495, whisper_loss=0.08893, over 3562548.17 frames. ], batch size: 84, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:01:02,872 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 13:01:08,971 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.15 vs. limit=10.0 2024-08-15 13:01:30,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3193320.0, ans=0.0 2024-08-15 13:01:35,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3193420.0, ans=0.125 2024-08-15 13:01:38,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3193420.0, ans=0.0 2024-08-15 13:01:52,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3193520.0, ans=0.0 2024-08-15 13:02:04,096 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-15 13:02:04,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3193520.0, ans=0.1 2024-08-15 13:02:07,985 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 13:02:30,179 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 550, loss[loss=0.09727, beats_loss=0.01179, ecapa_loss=0.0001229, whisper_loss=0.08425, over 21863.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.0001494, whisper_loss=0.08879, over 3614394.75 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:02:30,419 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 13:02:41,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3193720.0, ans=0.2 2024-08-15 13:02:45,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.304e+01 2.513e+01 2.793e+01 3.514e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 13:02:48,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3193820.0, ans=0.0 2024-08-15 13:02:57,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3193820.0, ans=0.125 2024-08-15 13:03:01,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3193820.0, ans=0.0 2024-08-15 13:03:01,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3193820.0, ans=0.09899494936611666 2024-08-15 13:03:16,132 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 13:03:37,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3194020.0, ans=0.125 2024-08-15 13:03:39,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-08-15 13:03:56,382 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 600, loss[loss=0.1089, beats_loss=0.01259, ecapa_loss=0.0001396, whisper_loss=0.09492, over 14336.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001494, whisper_loss=0.08888, over 3642580.41 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:04:00,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3194220.0, ans=0.0 2024-08-15 13:04:05,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3194220.0, ans=0.1 2024-08-15 13:04:11,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3194320.0, ans=0.125 2024-08-15 13:04:26,569 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 13:04:46,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3194520.0, ans=0.2 2024-08-15 13:04:50,094 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 13:04:50,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.20 vs. limit=10.0 2024-08-15 13:05:03,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3194620.0, ans=0.1 2024-08-15 13:05:07,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 650, loss[loss=0.08147, beats_loss=0.01435, ecapa_loss=0.0001101, whisper_loss=0.06602, over 15502.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01049, ecapa_loss=0.0001483, whisper_loss=0.08847, over 3673118.50 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:05:18,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.357e+01 2.569e+01 2.869e+01 2.947e+02, threshold=5.138e+01, percent-clipped=4.0 2024-08-15 13:05:41,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-15 13:05:50,071 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 36 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 13:06:07,421 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 13:06:07,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3195120.0, ans=15.0 2024-08-15 13:06:10,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3195120.0, ans=0.1 2024-08-15 13:06:12,308 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 700, loss[loss=0.09657, beats_loss=0.01148, ecapa_loss=0.0001381, whisper_loss=0.08371, over 21371.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01053, ecapa_loss=0.0001483, whisper_loss=0.08874, over 3706288.28 frames. ], batch size: 87, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:06:32,869 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 13:06:41,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3195420.0, ans=0.0 2024-08-15 13:06:42,816 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 13:06:50,106 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 13:07:01,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3195520.0, ans=0.125 2024-08-15 13:07:16,519 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 750, loss[loss=0.112, beats_loss=0.01175, ecapa_loss=0.0001415, whisper_loss=0.09879, over 22565.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01053, ecapa_loss=0.000147, whisper_loss=0.089, over 3737822.26 frames. ], batch size: 89, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:07:22,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-15 13:07:28,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.327e+01 2.582e+01 2.848e+01 1.200e+02, threshold=5.164e+01, percent-clipped=2.0 2024-08-15 13:07:34,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2024-08-15 13:07:46,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=12.0 2024-08-15 13:07:55,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-08-15 13:07:57,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3196020.0, ans=0.1 2024-08-15 13:07:57,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3196020.0, ans=0.2 2024-08-15 13:08:01,460 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.145e+01 2024-08-15 13:08:21,592 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 800, loss[loss=0.07853, beats_loss=0.01289, ecapa_loss=0.0001064, whisper_loss=0.06458, over 19090.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001482, whisper_loss=0.08923, over 3815053.75 frames. ], batch size: 75, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:08:48,231 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 13:09:10,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3196520.0, ans=0.125 2024-08-15 13:09:16,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3196620.0, ans=0.09899494936611666 2024-08-15 13:09:22,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3196620.0, ans=0.2 2024-08-15 13:09:23,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3196620.0, ans=0.125 2024-08-15 13:09:26,890 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-15 13:09:27,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 850, loss[loss=0.1017, beats_loss=0.009894, ecapa_loss=0.0001503, whisper_loss=0.09028, over 22527.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01054, ecapa_loss=0.000147, whisper_loss=0.0885, over 3831078.31 frames. ], batch size: 84, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:09:34,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3196720.0, ans=0.125 2024-08-15 13:09:38,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.346e+01 2.636e+01 2.893e+01 3.086e+02, threshold=5.271e+01, percent-clipped=3.0 2024-08-15 13:09:45,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3196820.0, ans=10.0 2024-08-15 13:09:45,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3196820.0, ans=0.125 2024-08-15 13:10:07,687 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 30 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 13:10:21,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-08-15 13:10:21,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-15 13:10:27,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3197120.0, ans=0.2 2024-08-15 13:10:30,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3197120.0, ans=0.04949747468305833 2024-08-15 13:10:33,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 900, loss[loss=0.09874, beats_loss=0.01224, ecapa_loss=0.0001259, whisper_loss=0.08524, over 23754.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001477, whisper_loss=0.08912, over 3857329.41 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:10:33,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3197220.0, ans=0.125 2024-08-15 13:10:36,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3197220.0, ans=0.1 2024-08-15 13:10:51,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3197320.0, ans=0.125 2024-08-15 13:10:53,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3197320.0, ans=0.125 2024-08-15 13:11:00,456 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 13:11:04,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3197420.0, ans=0.0 2024-08-15 13:11:13,729 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 13:11:16,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3197520.0, ans=0.125 2024-08-15 13:11:21,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3197520.0, ans=10.0 2024-08-15 13:11:38,305 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 950, loss[loss=0.09709, beats_loss=0.01204, ecapa_loss=0.000162, whisper_loss=0.08343, over 21491.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001484, whisper_loss=0.08934, over 3857939.16 frames. ], batch size: 84, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:11:49,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3197720.0, ans=0.0 2024-08-15 13:11:50,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.292e+01 2.595e+01 2.867e+01 1.968e+02, threshold=5.190e+01, percent-clipped=1.0 2024-08-15 13:11:50,381 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 13:11:51,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3197820.0, ans=0.0 2024-08-15 13:11:54,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3197820.0, ans=0.1 2024-08-15 13:12:25,760 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 13:12:26,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3198020.0, ans=0.125 2024-08-15 13:12:28,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3198020.0, ans=0.125 2024-08-15 13:12:44,257 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1000, loss[loss=0.101, beats_loss=0.01215, ecapa_loss=0.0001485, whisper_loss=0.08734, over 21527.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001475, whisper_loss=0.08931, over 3848973.09 frames. ], batch size: 87, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:13:08,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3198320.0, ans=0.125 2024-08-15 13:13:08,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3198320.0, ans=0.125 2024-08-15 13:13:19,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-08-15 13:13:21,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3198420.0, ans=0.2 2024-08-15 13:13:36,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-08-15 13:13:43,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3198620.0, ans=0.0 2024-08-15 13:13:47,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198620.0, ans=0.1 2024-08-15 13:13:49,665 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1050, loss[loss=0.09254, beats_loss=0.01235, ecapa_loss=0.0001564, whisper_loss=0.07863, over 17245.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.0001475, whisper_loss=0.08865, over 3797619.01 frames. ], batch size: 71, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:01,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.361e+01 2.585e+01 2.930e+01 4.862e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 13:14:09,545 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.232e+01 2024-08-15 13:14:18,766 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 13:14:42,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3199120.0, ans=0.1 2024-08-15 13:14:42,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3199120.0, ans=0.0 2024-08-15 13:14:53,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3199220.0, ans=0.125 2024-08-15 13:14:54,316 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1100, loss[loss=0.09628, beats_loss=0.01122, ecapa_loss=0.0001347, whisper_loss=0.08372, over 17005.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001471, whisper_loss=0.08935, over 3821420.26 frames. ], batch size: 67, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:59,028 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.161e+00 2024-08-15 13:15:01,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3199220.0, ans=0.0 2024-08-15 13:15:16,201 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 13:15:22,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3199420.0, ans=0.125 2024-08-15 13:15:24,355 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 13:15:25,665 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 13:15:25,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3199420.0, ans=0.2 2024-08-15 13:15:59,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1150, loss[loss=0.1089, beats_loss=0.009546, ecapa_loss=0.0001682, whisper_loss=0.09767, over 17722.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001482, whisper_loss=0.08987, over 3804168.03 frames. ], batch size: 71, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:16:07,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-15 13:16:11,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.313e+01 2.590e+01 2.898e+01 5.614e+01, threshold=5.180e+01, percent-clipped=1.0 2024-08-15 13:16:18,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3199820.0, ans=0.125 2024-08-15 13:16:26,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3199920.0, ans=0.125 2024-08-15 13:16:26,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3199920.0, ans=0.125 2024-08-15 13:16:28,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3199920.0, ans=0.0 2024-08-15 13:16:40,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3199920.0, ans=0.2 2024-08-15 13:16:42,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3200020.0, ans=0.0 2024-08-15 13:17:01,365 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 13:17:01,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3200120.0, ans=0.0 2024-08-15 13:17:07,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3200120.0, ans=0.125 2024-08-15 13:17:09,282 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1200, loss[loss=0.1195, beats_loss=0.00834, ecapa_loss=0.0001251, whisper_loss=0.1099, over 18898.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.000147, whisper_loss=0.08951, over 3777380.69 frames. ], batch size: 71, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:17:10,747 WARNING [optim.py:496] (2/4) Scaling gradients by 0.052070412784814835, model_norm_threshold=51.8048095703125 2024-08-15 13:17:10,921 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.791e+07, orig_rms_sq=1.005e-02 2024-08-15 13:17:19,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3200220.0, ans=0.125 2024-08-15 13:17:21,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3200320.0, ans=0.2 2024-08-15 13:17:28,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3200320.0, ans=0.125 2024-08-15 13:17:35,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3200420.0, ans=0.2 2024-08-15 13:17:42,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-15 13:17:48,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3200520.0, ans=10.0 2024-08-15 13:17:49,807 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 13:17:51,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.36 vs. limit=6.0 2024-08-15 13:17:55,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3200520.0, ans=0.125 2024-08-15 13:18:02,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=15.0 2024-08-15 13:18:08,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3200620.0, ans=0.125 2024-08-15 13:18:15,662 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1250, loss[loss=0.1334, beats_loss=0.007259, ecapa_loss=0.0001326, whisper_loss=0.1248, over 22999.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001468, whisper_loss=0.08974, over 3788168.39 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:18:27,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.258e+01 2.452e+01 2.719e+01 9.949e+02, threshold=4.904e+01, percent-clipped=2.0 2024-08-15 13:18:27,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3200820.0, ans=0.125 2024-08-15 13:18:29,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3200820.0, ans=0.125 2024-08-15 13:18:33,011 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 13:18:42,122 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-15 13:18:42,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3200920.0, ans=0.1 2024-08-15 13:18:43,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3200920.0, ans=0.0 2024-08-15 13:18:48,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3200920.0, ans=0.125 2024-08-15 13:18:54,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3201020.0, ans=0.125 2024-08-15 13:18:55,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3201020.0, ans=0.125 2024-08-15 13:19:04,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3201020.0, ans=0.125 2024-08-15 13:19:06,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2024-08-15 13:19:12,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3201120.0, ans=0.0 2024-08-15 13:19:19,963 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 13:19:21,112 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1300, loss[loss=0.1015, beats_loss=0.0113, ecapa_loss=0.0001495, whisper_loss=0.08869, over 21171.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.000148, whisper_loss=0.08988, over 3813517.03 frames. ], batch size: 85, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:19:24,146 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 36 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-15 13:19:28,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3201220.0, ans=0.125 2024-08-15 13:19:29,588 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 13:19:35,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3201320.0, ans=0.125 2024-08-15 13:19:42,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3201320.0, ans=0.125 2024-08-15 13:19:49,208 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 19 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-15 13:20:06,082 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 13:20:14,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3201620.0, ans=0.125 2024-08-15 13:20:14,918 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:20:23,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3201620.0, ans=0.125 2024-08-15 13:20:27,032 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1350, loss[loss=0.1121, beats_loss=0.01051, ecapa_loss=0.0001411, whisper_loss=0.1002, over 21283.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001475, whisper_loss=0.08967, over 3811532.12 frames. ], batch size: 83, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:20:39,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.225e+01 2.528e+01 2.736e+01 6.244e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-15 13:20:48,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3201820.0, ans=0.1 2024-08-15 13:20:56,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3201920.0, ans=0.1 2024-08-15 13:20:59,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3201920.0, ans=0.0 2024-08-15 13:21:05,349 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 13:21:15,013 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 13:21:22,161 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 12 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 13:21:34,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1400, loss[loss=0.09942, beats_loss=0.0113, ecapa_loss=0.0001064, whisper_loss=0.08706, over 20860.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001488, whisper_loss=0.08911, over 3808655.12 frames. ], batch size: 79, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:21:35,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202220.0, ans=0.1 2024-08-15 13:21:39,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3202220.0, ans=0.125 2024-08-15 13:21:54,307 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:21:58,263 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 13:22:16,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3202520.0, ans=0.125 2024-08-15 13:22:20,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3202520.0, ans=0.1 2024-08-15 13:22:47,478 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1450, loss[loss=0.1091, beats_loss=0.007364, ecapa_loss=0.0001571, whisper_loss=0.1001, over 14988.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.08892, over 3803995.02 frames. ], batch size: 57, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:22:47,643 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 13:23:15,758 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 13:23:20,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3202720.0, ans=0.0 2024-08-15 13:23:24,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.256e+01 2.495e+01 2.819e+01 4.681e+02, threshold=4.990e+01, percent-clipped=2.0 2024-08-15 13:23:26,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3202820.0, ans=10.0 2024-08-15 13:23:30,877 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 13:23:36,209 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 13:23:39,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3202920.0, ans=0.0 2024-08-15 13:23:39,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3202920.0, ans=0.125 2024-08-15 13:23:40,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3202920.0, ans=0.125 2024-08-15 13:23:49,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3202920.0, ans=0.05 2024-08-15 13:23:52,958 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 13:24:11,236 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 13:24:25,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1500, loss[loss=0.09913, beats_loss=0.009895, ecapa_loss=0.000129, whisper_loss=0.08794, over 14636.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.000148, whisper_loss=0.08865, over 3802845.14 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:24:29,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3203220.0, ans=0.1 2024-08-15 13:24:45,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3203320.0, ans=0.125 2024-08-15 13:24:54,973 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 13:24:56,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3203420.0, ans=0.1 2024-08-15 13:25:04,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3203420.0, ans=0.1 2024-08-15 13:25:12,270 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 13:25:27,631 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-15 13:25:36,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3203620.0, ans=0.125 2024-08-15 13:25:38,800 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1550, loss[loss=0.1117, beats_loss=0.01108, ecapa_loss=0.0001216, whisper_loss=0.09944, over 19184.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01057, ecapa_loss=0.0001467, whisper_loss=0.08809, over 3821405.09 frames. ], batch size: 71, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:25:39,100 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 13:25:42,426 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 13:25:49,098 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 13:25:51,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.256e+01 2.497e+01 2.794e+01 4.870e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 13:25:57,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3203820.0, ans=0.125 2024-08-15 13:26:02,061 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 13:26:23,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3204020.0, ans=0.2 2024-08-15 13:26:28,012 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:26:38,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-15 13:26:54,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1600, loss[loss=0.1108, beats_loss=0.01202, ecapa_loss=0.000103, whisper_loss=0.09772, over 18602.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001465, whisper_loss=0.08928, over 3840654.37 frames. ], batch size: 68, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:27:11,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3204320.0, ans=0.0 2024-08-15 13:27:26,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3204420.0, ans=0.0 2024-08-15 13:27:26,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.57 vs. limit=10.0 2024-08-15 13:27:31,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3204420.0, ans=0.125 2024-08-15 13:27:31,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3204420.0, ans=0.1 2024-08-15 13:27:33,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3204420.0, ans=0.0 2024-08-15 13:27:48,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3204520.0, ans=0.2 2024-08-15 13:27:57,456 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:28:08,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1650, loss[loss=0.09264, beats_loss=0.01178, ecapa_loss=0.0001257, whisper_loss=0.0796, over 18811.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.08943, over 3844710.99 frames. ], batch size: 75, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:28:11,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3204720.0, ans=0.125 2024-08-15 13:28:21,704 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.261e+01 2.464e+01 2.812e+01 1.426e+02, threshold=4.927e+01, percent-clipped=1.0 2024-08-15 13:28:22,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3204820.0, ans=0.0 2024-08-15 13:28:24,690 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-15 13:28:36,447 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 13:29:18,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3205120.0, ans=0.1 2024-08-15 13:29:22,898 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1700, loss[loss=0.1293, beats_loss=0.007505, ecapa_loss=0.0001439, whisper_loss=0.1203, over 16255.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.09003, over 3844434.28 frames. ], batch size: 60, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:30:00,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3205420.0, ans=0.0 2024-08-15 13:30:04,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3205420.0, ans=0.125 2024-08-15 13:30:06,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-15 13:30:37,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3205720.0, ans=0.125 2024-08-15 13:30:38,599 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1750, loss[loss=0.1078, beats_loss=0.01069, ecapa_loss=0.0001469, whisper_loss=0.09567, over 22530.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001456, whisper_loss=0.0903, over 3880969.66 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:30:51,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.279e+01 2.476e+01 2.729e+01 6.838e+01, threshold=4.951e+01, percent-clipped=2.0 2024-08-15 13:30:55,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3205820.0, ans=0.1 2024-08-15 13:30:55,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2024-08-15 13:30:56,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3205820.0, ans=0.125 2024-08-15 13:30:58,537 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.990e-01 2024-08-15 13:31:29,045 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 13:31:50,709 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 13:31:53,495 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1800, loss[loss=0.1078, beats_loss=0.008888, ecapa_loss=0.0001439, whisper_loss=0.09744, over 21622.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001452, whisper_loss=0.09031, over 3885249.36 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:31:53,930 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 13:32:11,204 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 13:32:27,119 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 19 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-15 13:32:32,919 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:32:44,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3206520.0, ans=0.0 2024-08-15 13:33:06,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3206720.0, ans=0.125 2024-08-15 13:33:06,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1850, loss[loss=0.1052, beats_loss=0.01151, ecapa_loss=0.0001306, whisper_loss=0.09239, over 16841.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001456, whisper_loss=0.08996, over 3857895.47 frames. ], batch size: 66, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:33:12,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-15 13:33:17,697 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 13:33:20,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.257e+01 2.507e+01 2.743e+01 3.719e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-15 13:33:20,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3206820.0, ans=0.025 2024-08-15 13:33:26,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3206820.0, ans=0.125 2024-08-15 13:33:40,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3206920.0, ans=0.125 2024-08-15 13:33:44,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3206920.0, ans=0.2 2024-08-15 13:34:01,368 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 13:34:12,015 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-15 13:34:21,153 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1900, loss[loss=0.09561, beats_loss=0.01185, ecapa_loss=0.0001378, whisper_loss=0.08238, over 21283.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001454, whisper_loss=0.09036, over 3851493.58 frames. ], batch size: 89, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:34:38,738 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 13:34:52,750 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:35:09,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3207520.0, ans=0.0 2024-08-15 13:35:18,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3207520.0, ans=0.125 2024-08-15 13:35:26,576 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 18 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-15 13:35:28,098 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 27 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 13:35:34,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3207620.0, ans=0.0 2024-08-15 13:35:36,863 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 1950, loss[loss=0.09544, beats_loss=0.01049, ecapa_loss=0.0001291, whisper_loss=0.08366, over 18399.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.000146, whisper_loss=0.09012, over 3872965.46 frames. ], batch size: 72, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:35:39,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3207720.0, ans=0.0 2024-08-15 13:35:44,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3207720.0, ans=0.125 2024-08-15 13:35:44,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3207720.0, ans=0.125 2024-08-15 13:35:49,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.350e+01 2.550e+01 2.908e+01 4.451e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-15 13:35:51,831 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-15 13:36:01,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3207820.0, ans=0.0 2024-08-15 13:36:03,259 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 13:36:08,699 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.518e-03 2024-08-15 13:36:11,480 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 13:36:11,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3207920.0, ans=0.125 2024-08-15 13:36:12,918 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 13:36:23,332 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3208020.0, ans=0.0 2024-08-15 13:36:24,298 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 13:36:25,985 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-15 13:36:27,949 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 36 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 13:36:41,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3208120.0, ans=0.1 2024-08-15 13:36:50,747 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2000, loss[loss=0.09388, beats_loss=0.01292, ecapa_loss=0.0001432, whisper_loss=0.07953, over 21794.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001457, whisper_loss=0.08978, over 3841599.61 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:37:02,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3208220.0, ans=0.02 2024-08-15 13:37:04,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=12.0 2024-08-15 13:37:34,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-15 13:37:43,142 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 13:37:47,307 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 13:37:53,538 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 13:38:05,872 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 31 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 13:38:08,450 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2050, loss[loss=0.08551, beats_loss=0.01417, ecapa_loss=0.0001404, whisper_loss=0.06994, over 18344.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.000146, whisper_loss=0.08935, over 3825875.76 frames. ], batch size: 76, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:38:16,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.43 vs. limit=22.5 2024-08-15 13:38:20,854 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 13:38:22,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.501e+01 2.773e+01 1.854e+02, threshold=5.002e+01, percent-clipped=2.0 2024-08-15 13:38:22,556 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 13:38:30,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3208820.0, ans=0.125 2024-08-15 13:38:45,802 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-15 13:38:51,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3208920.0, ans=10.0 2024-08-15 13:39:04,405 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 13:39:22,695 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2100, loss[loss=0.121, beats_loss=0.008684, ecapa_loss=0.0001521, whisper_loss=0.1108, over 16284.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001453, whisper_loss=0.08957, over 3819445.94 frames. ], batch size: 64, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:40:11,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3209520.0, ans=0.125 2024-08-15 13:40:29,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3209620.0, ans=0.035 2024-08-15 13:40:35,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2150, loss[loss=0.115, beats_loss=0.009476, ecapa_loss=0.0001259, whisper_loss=0.1043, over 14893.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001451, whisper_loss=0.08948, over 3814433.57 frames. ], batch size: 55, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:40:40,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-08-15 13:40:49,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.347e+01 2.631e+01 2.979e+01 4.158e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-15 13:41:03,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3209920.0, ans=0.0 2024-08-15 13:41:07,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3209920.0, ans=0.1 2024-08-15 13:41:14,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3209920.0, ans=0.0 2024-08-15 13:41:49,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2200, loss[loss=0.1296, beats_loss=0.008497, ecapa_loss=0.0001685, whisper_loss=0.1195, over 20517.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.000146, whisper_loss=0.0896, over 3798872.64 frames. ], batch size: 82, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:42:14,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210320.0, ans=0.1 2024-08-15 13:42:15,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3210320.0, ans=0.125 2024-08-15 13:42:17,807 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 13:42:26,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3210420.0, ans=0.2 2024-08-15 13:42:38,639 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 13:43:04,758 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2250, loss[loss=0.08913, beats_loss=0.01357, ecapa_loss=0.0001028, whisper_loss=0.07453, over 20318.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.000148, whisper_loss=0.08993, over 3783828.86 frames. ], batch size: 80, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:43:07,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3210720.0, ans=0.04949747468305833 2024-08-15 13:43:17,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.315e+01 2.592e+01 2.973e+01 1.052e+02, threshold=5.184e+01, percent-clipped=4.0 2024-08-15 13:43:18,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3210820.0, ans=0.125 2024-08-15 13:43:34,718 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 13:43:39,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3210920.0, ans=0.125 2024-08-15 13:43:45,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3210920.0, ans=0.125 2024-08-15 13:43:47,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3210920.0, ans=0.0 2024-08-15 13:43:51,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3211020.0, ans=0.125 2024-08-15 13:43:56,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3211020.0, ans=0.125 2024-08-15 13:44:16,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3211120.0, ans=0.0 2024-08-15 13:44:20,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3211220.0, ans=0.125 2024-08-15 13:44:21,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2300, loss[loss=0.09818, beats_loss=0.01223, ecapa_loss=0.0001381, whisper_loss=0.08457, over 19752.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001484, whisper_loss=0.08997, over 3831541.24 frames. ], batch size: 78, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:44:32,239 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 13:44:48,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3211320.0, ans=0.125 2024-08-15 13:44:55,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3211420.0, ans=0.0 2024-08-15 13:45:01,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-15 13:45:07,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-08-15 13:45:23,481 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 13:45:32,805 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 13:45:47,921 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2350, loss[loss=0.09313, beats_loss=0.008852, ecapa_loss=0.0001806, whisper_loss=0.08247, over 19722.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001489, whisper_loss=0.0905, over 3846714.43 frames. ], batch size: 83, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:45:50,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3211720.0, ans=0.125 2024-08-15 13:46:02,435 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 13:46:03,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.354e+01 2.614e+01 2.902e+01 1.801e+02, threshold=5.228e+01, percent-clipped=1.0 2024-08-15 13:46:13,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3211820.0, ans=0.125 2024-08-15 13:46:14,969 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0751657783985138, model_norm_threshold=52.2847900390625 2024-08-15 13:46:15,146 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.469e+04, grad_sumsq=8.469e+04, orig_rms_sq=1.000e+00 2024-08-15 13:46:15,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3211820.0, ans=0.125 2024-08-15 13:46:27,278 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 13:46:29,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3211920.0, ans=0.2 2024-08-15 13:46:34,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.99 vs. limit=5.0 2024-08-15 13:47:01,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2024-08-15 13:47:13,686 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2400, loss[loss=0.1096, beats_loss=0.01033, ecapa_loss=0.0001511, whisper_loss=0.09778, over 21674.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001482, whisper_loss=0.0908, over 3887137.45 frames. ], batch size: 85, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:47:23,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-15 13:47:35,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.12 vs. limit=10.0 2024-08-15 13:47:36,439 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.588e-02 2024-08-15 13:47:36,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-15 13:48:09,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3212520.0, ans=0.125 2024-08-15 13:48:14,682 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 13:48:25,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-08-15 13:48:27,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3212620.0, ans=0.125 2024-08-15 13:48:35,894 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2450, loss[loss=0.09549, beats_loss=0.01087, ecapa_loss=0.0001383, whisper_loss=0.08324, over 22383.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001479, whisper_loss=0.09038, over 3875234.96 frames. ], batch size: 87, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:48:38,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2024-08-15 13:48:46,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-15 13:48:51,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.196e+01 2.471e+01 2.708e+01 6.956e+02, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 13:48:55,372 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 13:49:02,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-08-15 13:49:09,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3212920.0, ans=0.125 2024-08-15 13:49:26,981 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.34 vs. limit=10.0 2024-08-15 13:49:34,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3213020.0, ans=0.125 2024-08-15 13:49:46,716 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 12 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 13:49:54,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3213120.0, ans=0.125 2024-08-15 13:49:57,814 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2500, loss[loss=0.093, beats_loss=0.009505, ecapa_loss=0.0001509, whisper_loss=0.08198, over 20763.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001483, whisper_loss=0.09037, over 3876712.58 frames. ], batch size: 83, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:50:00,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3213220.0, ans=0.1 2024-08-15 13:50:06,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3213220.0, ans=0.125 2024-08-15 13:50:20,438 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 13:50:29,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3213320.0, ans=0.125 2024-08-15 13:50:41,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3213420.0, ans=0.09899494936611666 2024-08-15 13:50:43,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-08-15 13:50:49,069 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 13:50:50,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2024-08-15 13:50:57,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-15 13:51:22,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2550, loss[loss=0.1166, beats_loss=0.009409, ecapa_loss=0.0001784, whisper_loss=0.1054, over 21778.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001473, whisper_loss=0.09054, over 3875895.08 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:51:29,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3213720.0, ans=0.0 2024-08-15 13:51:38,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.247e+01 2.527e+01 2.799e+01 4.421e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 13:51:40,385 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 13:51:55,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3213820.0, ans=0.125 2024-08-15 13:52:10,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3213920.0, ans=0.0 2024-08-15 13:52:13,192 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 13:52:23,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3214020.0, ans=0.125 2024-08-15 13:52:30,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3214020.0, ans=0.125 2024-08-15 13:52:32,881 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 13:52:33,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3214120.0, ans=0.2 2024-08-15 13:52:40,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3214120.0, ans=0.125 2024-08-15 13:52:51,903 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2600, loss[loss=0.0857, beats_loss=0.01156, ecapa_loss=0.0001361, whisper_loss=0.07278, over 21168.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.09055, over 3876527.81 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:52:55,282 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 13:52:55,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3214220.0, ans=10.0 2024-08-15 13:53:04,098 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 13:53:09,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3214320.0, ans=0.04949747468305833 2024-08-15 13:53:19,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3214320.0, ans=0.1 2024-08-15 13:53:19,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3214320.0, ans=10.0 2024-08-15 13:53:48,730 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:53:48,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3214520.0, ans=0.125 2024-08-15 13:53:57,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3214520.0, ans=0.05 2024-08-15 13:54:02,734 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 13:54:09,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3214620.0, ans=0.125 2024-08-15 13:54:17,108 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2650, loss[loss=0.1212, beats_loss=0.009867, ecapa_loss=0.0001292, whisper_loss=0.1101, over 23682.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001474, whisper_loss=0.08974, over 3870061.23 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:54:20,178 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-15 13:54:25,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3214720.0, ans=0.125 2024-08-15 13:54:32,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.308e+01 2.516e+01 2.935e+01 7.349e+01, threshold=5.032e+01, percent-clipped=1.0 2024-08-15 13:55:11,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3215020.0, ans=0.2 2024-08-15 13:55:19,180 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 27 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-15 13:55:37,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3215120.0, ans=0.0 2024-08-15 13:55:41,762 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2700, loss[loss=0.1011, beats_loss=0.01098, ecapa_loss=0.000152, whisper_loss=0.08862, over 22640.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01071, ecapa_loss=0.0001474, whisper_loss=0.08921, over 3904254.87 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:55:44,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3215220.0, ans=0.0 2024-08-15 13:56:00,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.76 vs. limit=10.0 2024-08-15 13:56:29,211 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 13:56:29,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3215420.0, ans=0.2 2024-08-15 13:56:45,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3215520.0, ans=0.02 2024-08-15 13:56:46,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3215520.0, ans=0.09899494936611666 2024-08-15 13:57:08,432 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2750, loss[loss=0.09023, beats_loss=0.01156, ecapa_loss=0.0001483, whisper_loss=0.07719, over 17136.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001466, whisper_loss=0.08979, over 3885934.30 frames. ], batch size: 71, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:57:23,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.384e+01 2.723e+01 3.158e+01 5.499e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-15 13:57:58,332 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 13:58:20,717 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-15 13:58:29,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3216120.0, ans=0.0 2024-08-15 13:58:35,325 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2800, loss[loss=0.09705, beats_loss=0.01158, ecapa_loss=0.0001388, whisper_loss=0.08408, over 16721.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001474, whisper_loss=0.09053, over 3891210.95 frames. ], batch size: 67, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:58:43,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3216220.0, ans=0.025 2024-08-15 13:58:49,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3216220.0, ans=0.0 2024-08-15 13:58:52,786 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 13:59:07,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3216320.0, ans=0.1 2024-08-15 13:59:15,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3216420.0, ans=0.125 2024-08-15 13:59:20,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3216420.0, ans=0.125 2024-08-15 13:59:22,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3216420.0, ans=0.125 2024-08-15 13:59:32,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3216520.0, ans=0.125 2024-08-15 13:59:42,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3216520.0, ans=0.2 2024-08-15 13:59:47,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3216620.0, ans=0.125 2024-08-15 14:00:02,958 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2850, loss[loss=0.131, beats_loss=0.01089, ecapa_loss=0.0001151, whisper_loss=0.119, over 25024.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001469, whisper_loss=0.09029, over 3870309.49 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:00:15,906 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 14:00:19,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.378e+01 2.685e+01 2.976e+01 3.795e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-15 14:00:34,128 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 14:00:37,447 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 14:01:05,098 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 14:01:06,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3217020.0, ans=0.125 2024-08-15 14:01:27,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3217120.0, ans=0.0 2024-08-15 14:01:30,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2900, loss[loss=0.1003, beats_loss=0.01295, ecapa_loss=0.0001206, whisper_loss=0.08617, over 23303.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0107, ecapa_loss=0.0001472, whisper_loss=0.08962, over 3857965.23 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:01:46,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3217320.0, ans=0.0 2024-08-15 14:01:51,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3217320.0, ans=0.125 2024-08-15 14:01:56,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3217320.0, ans=0.125 2024-08-15 14:02:05,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3217420.0, ans=0.125 2024-08-15 14:02:20,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217520.0, ans=0.1 2024-08-15 14:02:29,902 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-15 14:02:50,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3217720.0, ans=0.0 2024-08-15 14:02:51,774 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 2950, loss[loss=0.1058, beats_loss=0.01213, ecapa_loss=0.0001516, whisper_loss=0.0922, over 22881.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.000147, whisper_loss=0.08995, over 3861888.68 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:02:52,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3217720.0, ans=0.125 2024-08-15 14:03:02,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3217720.0, ans=0.125 2024-08-15 14:03:06,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.332e+01 2.577e+01 2.863e+01 4.280e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 14:03:32,967 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 14:03:33,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3217920.0, ans=0.125 2024-08-15 14:03:39,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3217920.0, ans=0.2 2024-08-15 14:03:46,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3218020.0, ans=0.05 2024-08-15 14:03:46,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3218020.0, ans=0.125 2024-08-15 14:03:47,336 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 14:03:49,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2024-08-15 14:03:58,153 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 14:04:06,499 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 39 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 14:04:14,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3218120.0, ans=0.1 2024-08-15 14:04:16,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3000, loss[loss=0.1035, beats_loss=0.01203, ecapa_loss=0.0001493, whisper_loss=0.09001, over 18634.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001478, whisper_loss=0.08978, over 3862057.64 frames. ], batch size: 78, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:04:16,021 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 14:04:54,935 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005381, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 14:05:14,523 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004148, beats_loss=0, ecapa_loss=0.0004148, whisper_loss=0, over 939242.00 frames. 2024-08-15 14:07:09,303 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 14:07:09,308 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 14:07:24,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3218320.0, ans=0.0 2024-08-15 14:07:42,503 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 14:07:56,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3218420.0, ans=0.2 2024-08-15 14:08:01,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3218520.0, ans=0.0 2024-08-15 14:08:02,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3218520.0, ans=0.1 2024-08-15 14:08:10,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3218520.0, ans=0.025 2024-08-15 14:08:20,689 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 13 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 14:08:28,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-08-15 14:08:33,888 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3050, loss[loss=0.1017, beats_loss=0.01111, ecapa_loss=0.0001913, whisper_loss=0.08865, over 14784.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001483, whisper_loss=0.08993, over 3890718.96 frames. ], batch size: 65, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:08:43,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3218720.0, ans=0.125 2024-08-15 14:08:51,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.307e+01 2.650e+01 2.894e+01 1.730e+02, threshold=5.300e+01, percent-clipped=1.0 2024-08-15 14:09:41,555 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 30 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 14:09:48,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.76 vs. limit=10.0 2024-08-15 14:09:50,958 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 14:10:01,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3100, loss[loss=0.09769, beats_loss=0.01252, ecapa_loss=0.0001205, whisper_loss=0.08396, over 15500.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001489, whisper_loss=0.09026, over 3864818.45 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:10:02,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3219220.0, ans=0.125 2024-08-15 14:10:02,428 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-15 14:10:03,368 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 14:10:03,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3219220.0, ans=0.0 2024-08-15 14:10:05,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3219220.0, ans=0.125 2024-08-15 14:10:11,305 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 14:10:20,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2024-08-15 14:10:30,976 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 14:10:34,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3219420.0, ans=0.0 2024-08-15 14:10:48,776 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:10:54,789 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 16 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 14:11:08,006 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 14:11:12,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3219620.0, ans=0.125 2024-08-15 14:11:22,796 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3150, loss[loss=0.1106, beats_loss=0.01083, ecapa_loss=0.0001639, whisper_loss=0.09811, over 21418.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01078, ecapa_loss=0.0001486, whisper_loss=0.08987, over 3862835.82 frames. ], batch size: 86, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:11:32,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3219720.0, ans=0.2 2024-08-15 14:11:33,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2024-08-15 14:11:34,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-15 14:11:36,919 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 14:11:38,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.271e+01 2.467e+01 2.810e+01 4.738e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-15 14:11:49,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3219820.0, ans=0.0 2024-08-15 14:11:53,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2024-08-15 14:12:09,797 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 14:12:22,415 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 40 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 14:12:25,119 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 14:12:31,404 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=22.5 2024-08-15 14:12:40,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3220120.0, ans=0.125 2024-08-15 14:12:48,121 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3200, loss[loss=0.1086, beats_loss=0.01063, ecapa_loss=0.0001412, whisper_loss=0.09652, over 19748.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001494, whisper_loss=0.09003, over 3878926.89 frames. ], batch size: 81, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:13:25,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3220420.0, ans=0.125 2024-08-15 14:13:26,442 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 14:13:29,803 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 14:13:36,473 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 14:13:39,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-15 14:13:41,591 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 14:13:43,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3220520.0, ans=0.1 2024-08-15 14:13:45,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3220520.0, ans=0.125 2024-08-15 14:13:45,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-15 14:13:48,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3220520.0, ans=0.125 2024-08-15 14:13:53,841 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-15 14:14:14,137 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3250, loss[loss=0.1027, beats_loss=0.01192, ecapa_loss=0.0001338, whisper_loss=0.08949, over 22498.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001496, whisper_loss=0.09086, over 3881432.32 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:14:16,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220720.0, ans=0.125 2024-08-15 14:14:29,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3220720.0, ans=0.125 2024-08-15 14:14:30,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.376e+01 2.667e+01 3.123e+01 1.417e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-15 14:14:38,712 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 14:14:53,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3220920.0, ans=0.2 2024-08-15 14:14:59,660 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-15 14:15:23,223 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 14:15:38,088 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3300, loss[loss=0.1034, beats_loss=0.009131, ecapa_loss=0.0001816, whisper_loss=0.09244, over 17906.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001502, whisper_loss=0.09099, over 3878275.19 frames. ], batch size: 75, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:16:02,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3221320.0, ans=0.2 2024-08-15 14:16:03,745 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 14:16:09,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3221320.0, ans=0.125 2024-08-15 14:16:10,541 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 14:16:22,873 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 14:16:44,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3221520.0, ans=0.1 2024-08-15 14:16:54,208 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-15 14:17:02,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2024-08-15 14:17:04,239 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3350, loss[loss=0.1141, beats_loss=0.009549, ecapa_loss=0.0001971, whisper_loss=0.1025, over 21967.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001504, whisper_loss=0.09093, over 3861204.62 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:17:07,969 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 14:17:10,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3221720.0, ans=0.0 2024-08-15 14:17:14,580 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 14:17:19,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.262e+01 2.579e+01 2.848e+01 8.552e+01, threshold=5.158e+01, percent-clipped=1.0 2024-08-15 14:17:28,039 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 14:17:37,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221920.0, ans=0.1 2024-08-15 14:17:50,748 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 14:18:13,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3222120.0, ans=0.1 2024-08-15 14:18:29,817 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3400, loss[loss=0.09835, beats_loss=0.01124, ecapa_loss=0.0001183, whisper_loss=0.08592, over 22790.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001486, whisper_loss=0.09107, over 3875992.84 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:18:32,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3222220.0, ans=0.1 2024-08-15 14:18:35,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3222220.0, ans=0.2 2024-08-15 14:18:41,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3222220.0, ans=0.04949747468305833 2024-08-15 14:18:48,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3222320.0, ans=0.125 2024-08-15 14:19:03,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-15 14:19:18,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-15 14:19:34,311 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 14:19:36,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3222620.0, ans=0.0 2024-08-15 14:19:51,254 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3450, loss[loss=0.1073, beats_loss=0.009311, ecapa_loss=0.0001792, whisper_loss=0.09618, over 22409.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001495, whisper_loss=0.0907, over 3858078.10 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:19:56,979 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 14:20:02,233 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 14:20:07,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.344e+01 2.608e+01 2.883e+01 4.857e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 14:20:25,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3222920.0, ans=0.0 2024-08-15 14:20:31,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3222920.0, ans=0.0 2024-08-15 14:20:40,271 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-15 14:20:42,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3223020.0, ans=0.125 2024-08-15 14:20:44,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3223020.0, ans=0.0 2024-08-15 14:20:50,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3223020.0, ans=0.2 2024-08-15 14:20:52,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3223020.0, ans=0.125 2024-08-15 14:21:05,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3223120.0, ans=0.1 2024-08-15 14:21:17,676 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3500, loss[loss=0.08437, beats_loss=0.01183, ecapa_loss=0.0001665, whisper_loss=0.07088, over 22447.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01069, ecapa_loss=0.0001496, whisper_loss=0.08984, over 3880742.12 frames. ], batch size: 96, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:21:35,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3223320.0, ans=0.125 2024-08-15 14:22:02,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3223420.0, ans=0.2 2024-08-15 14:22:06,724 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 15 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 14:22:23,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3223520.0, ans=0.04949747468305833 2024-08-15 14:22:36,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-08-15 14:22:37,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3223620.0, ans=0.125 2024-08-15 14:22:49,477 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3550, loss[loss=0.1061, beats_loss=0.01074, ecapa_loss=0.0001254, whisper_loss=0.09408, over 23761.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01068, ecapa_loss=0.0001489, whisper_loss=0.08916, over 3862233.78 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:23:02,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.289e+01 2.498e+01 2.772e+01 4.287e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:23:21,325 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 14:23:34,164 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 14:23:36,143 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-15 14:24:14,365 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 14:24:25,624 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3600, loss[loss=0.09659, beats_loss=0.01228, ecapa_loss=0.0001564, whisper_loss=0.08275, over 21589.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001494, whisper_loss=0.0903, over 3884009.10 frames. ], batch size: 87, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:25:33,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3224520.0, ans=0.05 2024-08-15 14:26:01,782 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-15 14:26:09,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3650, loss[loss=0.1222, beats_loss=0.008932, ecapa_loss=0.0001253, whisper_loss=0.112, over 22698.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001509, whisper_loss=0.08986, over 3876289.23 frames. ], batch size: 84, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:26:23,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3224720.0, ans=0.0 2024-08-15 14:26:24,021 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-15 14:26:30,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.327e+01 2.522e+01 2.934e+01 4.655e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 14:26:30,398 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 14:26:34,032 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 14:26:56,802 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 14:27:09,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224920.0, ans=0.1 2024-08-15 14:27:34,753 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05073240399360657, model_norm_threshold=50.43817901611328 2024-08-15 14:27:34,916 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.552e+05, grad_sumsq=1.540e+07, orig_rms_sq=1.008e-02 2024-08-15 14:27:53,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3225020.0, ans=15.0 2024-08-15 14:28:01,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-15 14:28:04,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2024-08-15 14:28:23,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3700, loss[loss=0.1146, beats_loss=0.01006, ecapa_loss=0.0001267, whisper_loss=0.1033, over 22288.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001508, whisper_loss=0.09003, over 3876185.00 frames. ], batch size: 84, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:28:28,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2024-08-15 14:28:31,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3225220.0, ans=0.1 2024-08-15 14:29:14,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3225320.0, ans=0.125 2024-08-15 14:29:19,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3225420.0, ans=0.125 2024-08-15 14:29:30,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-15 14:29:32,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2024-08-15 14:29:49,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3225520.0, ans=10.0 2024-08-15 14:30:11,160 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 14:30:34,439 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 14:30:37,133 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3750, loss[loss=0.1065, beats_loss=0.01035, ecapa_loss=0.0001258, whisper_loss=0.09491, over 19180.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001499, whisper_loss=0.09056, over 3868844.57 frames. ], batch size: 73, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:31:00,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.295e+01 2.515e+01 2.786e+01 9.942e+02, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 14:31:07,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3225820.0, ans=0.0 2024-08-15 14:31:20,744 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 14:31:38,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3225920.0, ans=0.125 2024-08-15 14:31:41,000 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 14:32:01,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.27 vs. limit=10.0 2024-08-15 14:32:04,456 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 14:32:04,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3226020.0, ans=0.125 2024-08-15 14:32:04,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3226020.0, ans=0.125 2024-08-15 14:32:12,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3226020.0, ans=0.125 2024-08-15 14:32:32,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3226120.0, ans=0.125 2024-08-15 14:32:38,020 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3800, loss[loss=0.1067, beats_loss=0.01065, ecapa_loss=0.0001344, whisper_loss=0.09472, over 22744.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001499, whisper_loss=0.0908, over 3892697.63 frames. ], batch size: 87, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:32:59,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3226320.0, ans=0.125 2024-08-15 14:33:05,014 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 14:33:06,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2024-08-15 14:33:25,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3226420.0, ans=0.125 2024-08-15 14:34:10,461 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3850, loss[loss=0.1046, beats_loss=0.01002, ecapa_loss=0.0001889, whisper_loss=0.09266, over 17916.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001487, whisper_loss=0.0905, over 3918629.22 frames. ], batch size: 75, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:34:16,222 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 14:34:19,712 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 14:34:27,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.293e+01 2.527e+01 2.817e+01 3.723e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 14:34:27,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3226820.0, ans=0.125 2024-08-15 14:34:28,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.71 vs. limit=10.0 2024-08-15 14:34:38,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3226820.0, ans=0.0 2024-08-15 14:35:07,740 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 14:35:11,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3227020.0, ans=0.09899494936611666 2024-08-15 14:35:19,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227020.0, ans=0.1 2024-08-15 14:35:24,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-15 14:35:33,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3227120.0, ans=0.2 2024-08-15 14:35:41,772 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 14:35:43,152 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3900, loss[loss=0.09269, beats_loss=0.01196, ecapa_loss=0.0001704, whisper_loss=0.07902, over 19763.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001492, whisper_loss=0.09067, over 3906265.05 frames. ], batch size: 84, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:35:45,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3227220.0, ans=10.0 2024-08-15 14:35:50,146 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.302e+00 2024-08-15 14:36:15,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3227320.0, ans=0.1 2024-08-15 14:36:28,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3227420.0, ans=0.2 2024-08-15 14:36:37,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-15 14:36:38,236 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:36:47,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3227520.0, ans=0.125 2024-08-15 14:36:47,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3227520.0, ans=0.07 2024-08-15 14:37:00,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3227620.0, ans=0.125 2024-08-15 14:37:01,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3227620.0, ans=0.125 2024-08-15 14:37:10,539 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 3950, loss[loss=0.1091, beats_loss=0.009227, ecapa_loss=0.0001573, whisper_loss=0.09832, over 15848.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01054, ecapa_loss=0.0001497, whisper_loss=0.09243, over 3912582.33 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:37:26,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.465e+01 2.719e+01 3.087e+01 1.515e+02, threshold=5.437e+01, percent-clipped=3.0 2024-08-15 14:37:32,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3227820.0, ans=0.0 2024-08-15 14:37:39,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3227820.0, ans=0.125 2024-08-15 14:37:40,669 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 14:38:01,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3227920.0, ans=0.035 2024-08-15 14:38:08,101 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 14:38:37,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3228120.0, ans=0.125 2024-08-15 14:38:39,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-15 14:38:39,651 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4000, loss[loss=0.08755, beats_loss=0.01135, ecapa_loss=0.0001916, whisper_loss=0.07428, over 20161.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01045, ecapa_loss=0.0001508, whisper_loss=0.09265, over 3918531.26 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:38:42,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3228220.0, ans=0.125 2024-08-15 14:38:42,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3228220.0, ans=0.125 2024-08-15 14:38:55,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3228320.0, ans=0.125 2024-08-15 14:39:32,113 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 34 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 14:39:56,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-15 14:40:05,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4050, loss[loss=0.09977, beats_loss=0.01346, ecapa_loss=0.0001176, whisper_loss=0.08514, over 22807.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001497, whisper_loss=0.09173, over 3863038.05 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:40:10,708 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 14:40:20,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3228720.0, ans=0.125 2024-08-15 14:40:24,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.316e+01 2.614e+01 2.943e+01 4.388e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-15 14:40:32,796 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 14:40:33,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3228820.0, ans=0.0 2024-08-15 14:40:47,933 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 14:40:48,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3228920.0, ans=0.125 2024-08-15 14:40:58,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3228920.0, ans=0.125 2024-08-15 14:41:15,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3229020.0, ans=0.125 2024-08-15 14:41:15,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-15 14:41:37,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3229120.0, ans=0.125 2024-08-15 14:41:58,867 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4100, loss[loss=0.1018, beats_loss=0.01252, ecapa_loss=0.0001311, whisper_loss=0.08799, over 17536.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.00015, whisper_loss=0.0909, over 3866150.15 frames. ], batch size: 71, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:41:59,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3229220.0, ans=0.0 2024-08-15 14:42:03,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3229220.0, ans=0.0 2024-08-15 14:42:20,843 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 34 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-15 14:42:21,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3229320.0, ans=0.125 2024-08-15 14:42:33,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3229320.0, ans=0.0 2024-08-15 14:42:38,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-15 14:42:40,359 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-15 14:42:55,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3229420.0, ans=0.0 2024-08-15 14:43:12,961 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 14:43:32,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3229520.0, ans=0.05 2024-08-15 14:44:04,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4150, loss[loss=0.09502, beats_loss=0.009399, ecapa_loss=0.0001796, whisper_loss=0.08382, over 14726.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.09147, over 3890708.00 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:44:13,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3229720.0, ans=0.125 2024-08-15 14:44:28,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.317e+01 2.580e+01 2.886e+01 4.298e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-15 14:44:33,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3229820.0, ans=0.125 2024-08-15 14:44:41,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3229820.0, ans=0.125 2024-08-15 14:44:50,544 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-15 14:44:50,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3229920.0, ans=0.1 2024-08-15 14:44:54,956 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 14:45:03,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-15 14:45:17,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3230120.0, ans=0.125 2024-08-15 14:45:33,801 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4200, loss[loss=0.09616, beats_loss=0.01208, ecapa_loss=0.0001593, whisper_loss=0.08248, over 18822.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001511, whisper_loss=0.09117, over 3881677.88 frames. ], batch size: 79, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:45:46,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3230220.0, ans=0.125 2024-08-15 14:45:48,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3230220.0, ans=0.125 2024-08-15 14:45:49,531 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 14:46:02,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3230320.0, ans=15.0 2024-08-15 14:46:03,034 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 27 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 14:46:42,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3230620.0, ans=0.125 2024-08-15 14:46:45,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3230620.0, ans=0.0 2024-08-15 14:46:53,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230620.0, ans=0.1 2024-08-15 14:47:02,383 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4250, loss[loss=0.1068, beats_loss=0.01129, ecapa_loss=0.0001142, whisper_loss=0.09435, over 16768.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.000152, whisper_loss=0.09097, over 3881466.04 frames. ], batch size: 64, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:47:07,489 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 14:47:07,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3230720.0, ans=0.125 2024-08-15 14:47:11,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3230720.0, ans=0.125 2024-08-15 14:47:20,522 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.308e+01 2.518e+01 2.859e+01 8.550e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-15 14:47:23,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3230820.0, ans=0.0 2024-08-15 14:47:30,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3230820.0, ans=0.125 2024-08-15 14:47:32,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3230820.0, ans=0.0 2024-08-15 14:47:36,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3230820.0, ans=0.0 2024-08-15 14:47:36,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3230820.0, ans=0.125 2024-08-15 14:47:38,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3230920.0, ans=0.125 2024-08-15 14:47:50,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3230920.0, ans=0.0 2024-08-15 14:47:51,781 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 14:47:59,248 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 14:48:06,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3231020.0, ans=0.0 2024-08-15 14:48:10,922 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 14:48:17,191 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 14:48:31,667 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 14:48:34,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4300, loss[loss=0.06679, beats_loss=0.0155, ecapa_loss=9.764e-05, whisper_loss=0.05032, over 13111.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001504, whisper_loss=0.09042, over 3872046.32 frames. ], batch size: 54, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:48:51,422 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 14:48:51,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3231320.0, ans=0.1 2024-08-15 14:48:56,143 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 14:49:42,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3231620.0, ans=0.125 2024-08-15 14:49:59,341 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4350, loss[loss=0.09019, beats_loss=0.01231, ecapa_loss=0.0001487, whisper_loss=0.07639, over 16951.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.09093, over 3902740.69 frames. ], batch size: 69, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:50:07,554 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 17 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-15 14:50:13,759 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 14:50:17,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.376e+01 2.619e+01 2.961e+01 5.969e+01, threshold=5.237e+01, percent-clipped=2.0 2024-08-15 14:51:07,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3232020.0, ans=0.0 2024-08-15 14:51:09,951 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 14:51:11,293 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-15 14:51:28,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4400, loss[loss=0.1062, beats_loss=0.01076, ecapa_loss=0.0001285, whisper_loss=0.09411, over 16443.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001496, whisper_loss=0.09042, over 3877253.03 frames. ], batch size: 62, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:52:15,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3232420.0, ans=0.125 2024-08-15 14:52:51,298 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4450, loss[loss=0.0948, beats_loss=0.01031, ecapa_loss=0.0001552, whisper_loss=0.08294, over 19898.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09077, over 3869311.44 frames. ], batch size: 81, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:53:08,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.314e+01 2.575e+01 2.815e+01 3.995e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 14:53:18,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3232820.0, ans=0.05 2024-08-15 14:53:18,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3232820.0, ans=0.125 2024-08-15 14:53:26,826 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 14:53:29,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2024-08-15 14:53:38,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3232920.0, ans=0.125 2024-08-15 14:53:49,120 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 14:53:57,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-15 14:54:05,771 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 14:54:25,046 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4500, loss[loss=0.09032, beats_loss=0.01224, ecapa_loss=0.0001377, whisper_loss=0.07671, over 22231.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001496, whisper_loss=0.09078, over 3875098.10 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:54:25,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3233220.0, ans=0.125 2024-08-15 14:54:33,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3233220.0, ans=0.0 2024-08-15 14:54:45,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-08-15 14:55:04,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3233420.0, ans=0.1 2024-08-15 14:55:10,471 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 14:55:25,612 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 14:55:44,368 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 14:55:51,204 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4550, loss[loss=0.1021, beats_loss=0.009861, ecapa_loss=0.0001434, whisper_loss=0.09077, over 21957.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001516, whisper_loss=0.09074, over 3872699.98 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:56:07,616 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.406e+01 2.642e+01 2.962e+01 1.202e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-15 14:56:18,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3233820.0, ans=0.125 2024-08-15 14:56:54,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3234020.0, ans=0.125 2024-08-15 14:57:08,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-15 14:57:09,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3234120.0, ans=0.0 2024-08-15 14:57:17,955 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:57:18,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4600, loss[loss=0.1059, beats_loss=0.009626, ecapa_loss=0.0001841, whisper_loss=0.09444, over 22091.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001524, whisper_loss=0.09032, over 3844957.16 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:57:23,716 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 31 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 14:57:46,390 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 14:57:47,001 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-15 14:57:52,879 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 14:57:54,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3234420.0, ans=0.0 2024-08-15 14:58:02,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3234420.0, ans=0.125 2024-08-15 14:58:24,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3234620.0, ans=0.125 2024-08-15 14:58:25,579 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 14:58:31,451 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 14:58:33,031 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 14:58:40,000 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4650, loss[loss=0.1265, beats_loss=0.008516, ecapa_loss=0.000131, whisper_loss=0.1166, over 19420.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001513, whisper_loss=0.09006, over 3856374.84 frames. ], batch size: 72, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:58:46,212 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:58:46,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3234720.0, ans=22.5 2024-08-15 14:58:56,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.314e+01 2.498e+01 2.884e+01 4.685e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:59:11,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234920.0, ans=0.1 2024-08-15 14:59:24,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3234920.0, ans=0.0 2024-08-15 14:59:30,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-15 14:59:36,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3235020.0, ans=0.0 2024-08-15 14:59:36,957 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 14:59:59,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3235120.0, ans=0.125 2024-08-15 15:00:08,350 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4700, loss[loss=0.08404, beats_loss=0.01116, ecapa_loss=0.0001317, whisper_loss=0.07157, over 17625.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001511, whisper_loss=0.08994, over 3875091.40 frames. ], batch size: 68, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:00:24,238 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 15:00:30,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3235320.0, ans=0.125 2024-08-15 15:00:40,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3235420.0, ans=0.0 2024-08-15 15:00:47,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3235420.0, ans=0.1 2024-08-15 15:00:49,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3235420.0, ans=0.05 2024-08-15 15:01:20,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3235620.0, ans=0.0 2024-08-15 15:01:33,465 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4750, loss[loss=0.08492, beats_loss=0.01046, ecapa_loss=0.0001894, whisper_loss=0.07257, over 17773.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001511, whisper_loss=0.08952, over 3833775.17 frames. ], batch size: 76, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:01:40,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3235720.0, ans=0.125 2024-08-15 15:01:46,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3235720.0, ans=0.125 2024-08-15 15:01:49,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.242e+01 2.450e+01 2.790e+01 3.790e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-15 15:01:49,321 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 15:02:17,382 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 15:02:38,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3236120.0, ans=0.1 2024-08-15 15:02:51,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4800, loss[loss=0.0914, beats_loss=0.01059, ecapa_loss=0.0002005, whisper_loss=0.0788, over 17243.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001516, whisper_loss=0.09022, over 3869018.03 frames. ], batch size: 74, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:03:07,492 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 15:03:07,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3236320.0, ans=0.0 2024-08-15 15:03:17,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3236320.0, ans=0.0 2024-08-15 15:03:18,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3236320.0, ans=0.125 2024-08-15 15:03:27,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3236420.0, ans=0.125 2024-08-15 15:03:35,717 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 15:03:45,625 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 15:03:51,688 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 15:04:01,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-08-15 15:04:09,488 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4850, loss[loss=0.1191, beats_loss=0.01079, ecapa_loss=0.00016, whisper_loss=0.1067, over 22597.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001518, whisper_loss=0.0905, over 3884409.58 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:04:12,673 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 15:04:24,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.428e+01 2.638e+01 3.060e+01 4.898e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 15:04:30,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3236820.0, ans=0.07 2024-08-15 15:04:36,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3236820.0, ans=0.0 2024-08-15 15:04:56,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3237020.0, ans=0.09899494936611666 2024-08-15 15:04:59,629 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.018e-02 2024-08-15 15:05:05,445 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 15:05:05,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3237020.0, ans=0.125 2024-08-15 15:05:14,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3237120.0, ans=0.1 2024-08-15 15:05:14,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3237120.0, ans=0.125 2024-08-15 15:05:18,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3237120.0, ans=0.2 2024-08-15 15:05:19,825 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-15 15:05:20,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3237220.0, ans=0.125 2024-08-15 15:05:21,937 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4900, loss[loss=0.09294, beats_loss=0.01157, ecapa_loss=0.0001195, whisper_loss=0.08017, over 17087.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001506, whisper_loss=0.09088, over 3870777.01 frames. ], batch size: 66, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:05:35,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3237320.0, ans=0.0 2024-08-15 15:05:41,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3237320.0, ans=10.0 2024-08-15 15:05:44,411 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 15:06:03,887 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 15:06:06,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3237520.0, ans=0.125 2024-08-15 15:06:14,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3237520.0, ans=0.0 2024-08-15 15:06:14,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3237520.0, ans=0.0 2024-08-15 15:06:17,592 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 15:06:25,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=12.0 2024-08-15 15:06:31,253 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 4950, loss[loss=0.09704, beats_loss=0.01158, ecapa_loss=0.0001612, whisper_loss=0.08385, over 20330.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001514, whisper_loss=0.09052, over 3822664.68 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:06:45,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.352e+01 2.615e+01 2.945e+01 2.370e+02, threshold=5.229e+01, percent-clipped=2.0 2024-08-15 15:06:56,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3237820.0, ans=0.125 2024-08-15 15:06:59,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3237920.0, ans=0.125 2024-08-15 15:07:06,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3237920.0, ans=0.1 2024-08-15 15:07:09,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3237920.0, ans=0.125 2024-08-15 15:07:11,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-08-15 15:07:12,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=12.0 2024-08-15 15:07:16,112 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 15:07:26,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3238120.0, ans=0.0 2024-08-15 15:07:40,746 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5000, loss[loss=0.09944, beats_loss=0.009249, ecapa_loss=0.0001654, whisper_loss=0.08854, over 18697.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001519, whisper_loss=0.0909, over 3824593.18 frames. ], batch size: 76, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:07:49,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3238220.0, ans=0.2 2024-08-15 15:07:54,397 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 18 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-15 15:08:05,500 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-15 15:08:07,550 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 15:08:12,884 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.645e-03 2024-08-15 15:08:13,878 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 15:08:21,555 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 17 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-15 15:08:26,659 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 15:08:32,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3238520.0, ans=0.0 2024-08-15 15:08:48,457 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5050, loss[loss=0.09089, beats_loss=0.01088, ecapa_loss=0.0001624, whisper_loss=0.07839, over 16804.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01067, ecapa_loss=0.0001509, whisper_loss=0.09104, over 3852552.95 frames. ], batch size: 68, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:08:57,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.87 vs. limit=22.5 2024-08-15 15:08:59,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3238720.0, ans=0.2 2024-08-15 15:09:02,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.271e+01 2.496e+01 2.876e+01 1.159e+02, threshold=4.993e+01, percent-clipped=2.0 2024-08-15 15:09:04,569 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 15:09:12,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2024-08-15 15:09:17,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3238920.0, ans=0.1 2024-08-15 15:09:20,740 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 33 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 15:09:22,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.33 vs. limit=10.0 2024-08-15 15:09:36,052 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-15 15:09:37,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-08-15 15:09:55,792 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5100, loss[loss=0.09791, beats_loss=0.008385, ecapa_loss=0.000177, whisper_loss=0.08775, over 16668.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001494, whisper_loss=0.09107, over 3846307.01 frames. ], batch size: 66, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:09:57,340 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 15:10:01,401 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 15:10:03,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3239220.0, ans=0.07 2024-08-15 15:10:15,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3239320.0, ans=0.125 2024-08-15 15:10:16,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3239320.0, ans=0.0 2024-08-15 15:10:20,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239320.0, ans=0.125 2024-08-15 15:10:34,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-15 15:10:42,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3239520.0, ans=0.125 2024-08-15 15:10:45,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239520.0, ans=0.125 2024-08-15 15:10:59,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3239620.0, ans=0.05 2024-08-15 15:11:03,284 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5150, loss[loss=0.1137, beats_loss=0.009815, ecapa_loss=0.0001594, whisper_loss=0.1023, over 15931.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001482, whisper_loss=0.0917, over 3866846.93 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:11:07,657 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 15:11:10,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3239720.0, ans=0.125 2024-08-15 15:11:16,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.367e+01 2.663e+01 3.034e+01 8.372e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-15 15:11:25,552 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 15:11:28,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=22.5 2024-08-15 15:11:34,885 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-15 15:11:39,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3239920.0, ans=0.0 2024-08-15 15:11:43,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3239920.0, ans=0.125 2024-08-15 15:11:48,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3240020.0, ans=0.0 2024-08-15 15:11:53,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3240020.0, ans=0.0 2024-08-15 15:11:55,749 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 32 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 15:12:03,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3240120.0, ans=0.125 2024-08-15 15:12:09,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3240120.0, ans=0.2 2024-08-15 15:12:11,410 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 15:12:15,175 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5200, loss[loss=0.1166, beats_loss=0.008425, ecapa_loss=0.0001884, whisper_loss=0.1062, over 18662.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.0001484, whisper_loss=0.09149, over 3866249.50 frames. ], batch size: 75, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:12:22,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240220.0, ans=0.1 2024-08-15 15:12:22,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3240220.0, ans=0.125 2024-08-15 15:12:50,033 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 15:12:52,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3240420.0, ans=10.0 2024-08-15 15:12:55,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3240420.0, ans=0.0 2024-08-15 15:13:04,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3240520.0, ans=0.05 2024-08-15 15:13:05,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240520.0, ans=0.1 2024-08-15 15:13:15,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3240620.0, ans=10.0 2024-08-15 15:13:20,781 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 15:13:22,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3240620.0, ans=0.0 2024-08-15 15:13:26,222 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5250, loss[loss=0.1157, beats_loss=0.008226, ecapa_loss=0.0001615, whisper_loss=0.1059, over 15593.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001485, whisper_loss=0.09191, over 3862727.88 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:13:30,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3240720.0, ans=0.125 2024-08-15 15:13:40,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.275e+01 2.578e+01 2.785e+01 8.879e+01, threshold=5.156e+01, percent-clipped=2.0 2024-08-15 15:13:44,991 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 15:13:50,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3240820.0, ans=0.125 2024-08-15 15:13:55,635 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 15:14:08,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3241020.0, ans=0.2 2024-08-15 15:14:23,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3241120.0, ans=0.125 2024-08-15 15:14:29,824 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 15:14:32,265 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 15:14:34,942 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 15:14:35,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3241120.0, ans=0.125 2024-08-15 15:14:36,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3241220.0, ans=0.125 2024-08-15 15:14:37,460 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5300, loss[loss=0.1375, beats_loss=0.00817, ecapa_loss=0.0001759, whisper_loss=0.1276, over 22541.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01049, ecapa_loss=0.00015, whisper_loss=0.09262, over 3902866.63 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:14:43,344 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 15:14:45,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.49 vs. limit=22.5 2024-08-15 15:14:56,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3241320.0, ans=0.125 2024-08-15 15:15:28,103 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 15:15:47,632 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5350, loss[loss=0.08832, beats_loss=0.01018, ecapa_loss=0.000154, whisper_loss=0.0766, over 23301.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01048, ecapa_loss=0.0001493, whisper_loss=0.09194, over 3876334.94 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:15:54,653 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 15:16:01,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.335e+01 2.708e+01 3.077e+01 2.135e+02, threshold=5.416e+01, percent-clipped=3.0 2024-08-15 15:16:13,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=12.0 2024-08-15 15:16:21,150 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 15:16:22,451 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 15:16:29,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3242020.0, ans=0.125 2024-08-15 15:16:31,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-15 15:16:34,632 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 15:16:44,360 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 15 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-15 15:16:56,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5400, loss[loss=0.08522, beats_loss=0.01126, ecapa_loss=0.0001434, whisper_loss=0.07253, over 18053.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001488, whisper_loss=0.09137, over 3876902.37 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:17:03,579 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08951901644468307, model_norm_threshold=54.15595626831055 2024-08-15 15:17:03,759 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.765e+04, grad_sumsq=4.730e+06, orig_rms_sq=1.007e-02 2024-08-15 15:17:08,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3242220.0, ans=0.0 2024-08-15 15:17:13,702 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 15:17:31,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3242420.0, ans=0.125 2024-08-15 15:17:35,170 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 15:17:39,136 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 15:17:58,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3242620.0, ans=0.125 2024-08-15 15:18:00,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3242620.0, ans=0.0 2024-08-15 15:18:06,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3242720.0, ans=0.0 2024-08-15 15:18:07,656 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5450, loss[loss=0.08246, beats_loss=0.009381, ecapa_loss=0.0001723, whisper_loss=0.07135, over 15758.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001503, whisper_loss=0.09114, over 3882698.42 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:18:09,507 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 15:18:12,471 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 15:18:22,584 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.261e+01 2.541e+01 2.887e+01 6.050e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-15 15:18:23,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3242820.0, ans=0.125 2024-08-15 15:18:24,759 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 15:18:51,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3242920.0, ans=0.0 2024-08-15 15:18:54,727 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 15:19:06,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3243020.0, ans=0.125 2024-08-15 15:19:08,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3243020.0, ans=0.2 2024-08-15 15:19:08,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=12.0 2024-08-15 15:19:20,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3243120.0, ans=10.0 2024-08-15 15:19:26,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5500, loss[loss=0.09512, beats_loss=0.0114, ecapa_loss=0.0001209, whisper_loss=0.08251, over 16373.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001498, whisper_loss=0.091, over 3897488.20 frames. ], batch size: 64, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:19:28,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-15 15:19:33,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243220.0, ans=0.1 2024-08-15 15:19:41,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3243320.0, ans=0.125 2024-08-15 15:19:41,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3243320.0, ans=0.5 2024-08-15 15:19:47,063 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 32 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 15:19:49,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=22.5 2024-08-15 15:19:54,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3243320.0, ans=0.2 2024-08-15 15:20:12,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3243420.0, ans=0.125 2024-08-15 15:20:14,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2024-08-15 15:20:49,372 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5550, loss[loss=0.0958, beats_loss=0.01077, ecapa_loss=0.0001568, whisper_loss=0.08347, over 21200.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001513, whisper_loss=0.09146, over 3900895.31 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:20:53,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3243720.0, ans=0.1 2024-08-15 15:20:55,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3243720.0, ans=0.125 2024-08-15 15:21:06,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.298e+01 2.585e+01 2.775e+01 4.176e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 15:21:17,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3243820.0, ans=0.1 2024-08-15 15:21:29,702 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 15:21:33,345 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2024-08-15 15:21:54,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-08-15 15:22:11,418 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.878e+00 2024-08-15 15:22:13,558 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5600, loss[loss=0.08648, beats_loss=0.01085, ecapa_loss=0.0001356, whisper_loss=0.07428, over 19534.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001511, whisper_loss=0.09102, over 3895550.84 frames. ], batch size: 76, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:22:18,751 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-15 15:22:28,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3244320.0, ans=0.0 2024-08-15 15:22:44,166 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 15:22:50,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3244420.0, ans=0.2 2024-08-15 15:22:53,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244420.0, ans=0.1 2024-08-15 15:23:14,986 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 15:23:35,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5650, loss[loss=0.09148, beats_loss=0.01033, ecapa_loss=0.0001724, whisper_loss=0.07943, over 14825.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001509, whisper_loss=0.09062, over 3892580.58 frames. ], batch size: 60, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:23:49,209 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-15 15:23:51,659 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 18 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 15:23:54,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3244820.0, ans=0.2 2024-08-15 15:23:55,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.317e+01 2.489e+01 2.780e+01 3.847e+01, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 15:23:58,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3244820.0, ans=0.125 2024-08-15 15:24:02,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3244820.0, ans=0.1 2024-08-15 15:24:22,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244920.0, ans=0.1 2024-08-15 15:24:24,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3244920.0, ans=0.125 2024-08-15 15:24:24,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3244920.0, ans=0.125 2024-08-15 15:24:51,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3245120.0, ans=0.0 2024-08-15 15:24:53,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3245120.0, ans=0.125 2024-08-15 15:24:54,031 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 15:24:58,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-15 15:25:01,891 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5700, loss[loss=0.0805, beats_loss=0.0128, ecapa_loss=0.0001619, whisper_loss=0.06608, over 16272.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001512, whisper_loss=0.0912, over 3917388.25 frames. ], batch size: 69, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:25:09,102 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 15:25:14,033 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 15:25:35,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2024-08-15 15:25:49,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-08-15 15:25:52,041 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 15:25:53,653 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 15:25:54,247 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.12 vs. limit=10.0 2024-08-15 15:26:00,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3245520.0, ans=0.2 2024-08-15 15:26:05,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-15 15:26:23,444 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 15:26:26,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3245620.0, ans=0.2 2024-08-15 15:26:28,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3245720.0, ans=0.2 2024-08-15 15:26:28,887 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5750, loss[loss=0.09188, beats_loss=0.01204, ecapa_loss=0.000135, whisper_loss=0.07848, over 16693.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001499, whisper_loss=0.09081, over 3895195.64 frames. ], batch size: 64, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:26:38,429 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 20 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 15:26:46,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.388e+01 2.608e+01 2.971e+01 1.987e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-15 15:26:52,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3245820.0, ans=0.0 2024-08-15 15:26:57,133 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.75 vs. limit=10.0 2024-08-15 15:27:10,102 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 15:27:14,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3245920.0, ans=0.0 2024-08-15 15:27:25,822 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 15:27:44,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3246120.0, ans=0.2 2024-08-15 15:27:51,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3246120.0, ans=0.0 2024-08-15 15:27:53,856 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5800, loss[loss=0.1045, beats_loss=0.009359, ecapa_loss=0.0001484, whisper_loss=0.0937, over 16966.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001496, whisper_loss=0.09038, over 3888674.36 frames. ], batch size: 66, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:27:56,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-08-15 15:28:00,694 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 15:28:04,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-15 15:28:07,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3246220.0, ans=0.125 2024-08-15 15:28:22,055 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-15 15:28:24,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3246420.0, ans=0.0 2024-08-15 15:28:37,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3246420.0, ans=0.0 2024-08-15 15:28:37,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3246420.0, ans=0.125 2024-08-15 15:28:58,074 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 15:29:04,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3246620.0, ans=0.2 2024-08-15 15:29:07,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3246620.0, ans=0.125 2024-08-15 15:29:09,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-08-15 15:29:11,625 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5850, loss[loss=0.1015, beats_loss=0.01128, ecapa_loss=0.0001623, whisper_loss=0.08856, over 21777.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001498, whisper_loss=0.09115, over 3892159.78 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:29:21,323 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.886e-02 2024-08-15 15:29:24,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3246720.0, ans=0.125 2024-08-15 15:29:26,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.269e+01 2.533e+01 2.888e+01 4.930e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-15 15:29:44,175 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 15:30:05,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2024-08-15 15:30:18,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3247120.0, ans=0.0 2024-08-15 15:30:25,917 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5900, loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001515, whisper_loss=0.09284, over 15762.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.00015, whisper_loss=0.09157, over 3900992.70 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:30:27,568 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 15:30:48,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-15 15:30:53,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3247320.0, ans=0.0 2024-08-15 15:31:00,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-15 15:31:16,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-15 15:31:30,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3247620.0, ans=0.125 2024-08-15 15:31:42,652 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 18 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-15 15:31:43,905 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 5950, loss[loss=0.07284, beats_loss=0.01341, ecapa_loss=0.0001589, whisper_loss=0.05784, over 21424.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001491, whisper_loss=0.09066, over 3908888.66 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:31:55,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3247720.0, ans=0.0 2024-08-15 15:31:58,713 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.298e+01 2.552e+01 2.863e+01 3.856e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-15 15:32:17,500 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 15:32:24,522 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 40 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 15:32:46,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-08-15 15:32:54,482 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6000, loss[loss=0.09984, beats_loss=0.01042, ecapa_loss=0.0001736, whisper_loss=0.08768, over 19976.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001498, whisper_loss=0.09117, over 3876509.58 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:32:54,483 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 15:33:33,376 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005302, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 15:33:54,201 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004186, beats_loss=0, ecapa_loss=0.0004186, whisper_loss=0, over 939242.00 frames. 2024-08-15 15:35:52,475 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 15:35:52,479 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 15:36:12,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248320.0, ans=0.1 2024-08-15 15:36:21,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-15 15:36:46,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3248520.0, ans=0.125 2024-08-15 15:36:48,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=22.5 2024-08-15 15:37:02,507 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6050, loss[loss=0.1107, beats_loss=0.01038, ecapa_loss=0.0001555, whisper_loss=0.09878, over 15311.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09093, over 3856528.57 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:37:16,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.402e+01 2.591e+01 2.977e+01 8.754e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-15 15:37:19,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248820.0, ans=0.1 2024-08-15 15:37:32,596 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 15:37:40,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3248920.0, ans=0.05 2024-08-15 15:37:45,168 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-15 15:37:54,958 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 15:38:05,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3249120.0, ans=0.0 2024-08-15 15:38:12,802 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6100, loss[loss=0.1143, beats_loss=0.01004, ecapa_loss=0.000134, whisper_loss=0.1029, over 24072.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001484, whisper_loss=0.09107, over 3900886.76 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:38:21,753 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:38:26,115 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 15:38:38,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=12.0 2024-08-15 15:38:45,270 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 15:38:52,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3249420.0, ans=0.125 2024-08-15 15:39:22,901 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6150, loss[loss=0.09651, beats_loss=0.01184, ecapa_loss=0.0001678, whisper_loss=0.08299, over 23289.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001483, whisper_loss=0.0904, over 3892775.72 frames. ], batch size: 98, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:39:23,075 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 15:39:36,465 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=12.0 2024-08-15 15:39:36,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-15 15:39:37,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.219e+01 2.437e+01 2.676e+01 4.381e+01, threshold=4.874e+01, percent-clipped=0.0 2024-08-15 15:39:46,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3249820.0, ans=0.2 2024-08-15 15:39:52,929 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 15:40:00,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3249920.0, ans=0.0 2024-08-15 15:40:00,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3249920.0, ans=0.1 2024-08-15 15:40:04,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3250020.0, ans=0.07 2024-08-15 15:40:13,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-15 15:40:14,266 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 15:40:34,288 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6200, loss[loss=0.1089, beats_loss=0.009528, ecapa_loss=0.0001543, whisper_loss=0.09787, over 22471.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001488, whisper_loss=0.09107, over 3901519.70 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:40:34,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3250220.0, ans=0.125 2024-08-15 15:40:34,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3250220.0, ans=0.125 2024-08-15 15:40:45,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2024-08-15 15:40:52,115 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 15:41:36,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3250620.0, ans=0.0 2024-08-15 15:41:46,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6250, loss[loss=0.1019, beats_loss=0.01014, ecapa_loss=0.0001697, whisper_loss=0.09007, over 18070.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001494, whisper_loss=0.09045, over 3902472.75 frames. ], batch size: 73, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:41:50,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2024-08-15 15:42:00,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-08-15 15:42:01,934 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.383e+01 2.624e+01 2.920e+01 1.622e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-15 15:42:06,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3250820.0, ans=0.5 2024-08-15 15:42:13,724 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 19 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-15 15:42:40,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-15 15:42:49,756 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 15:42:56,425 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6300, loss[loss=0.09036, beats_loss=0.009806, ecapa_loss=0.0001582, whisper_loss=0.07897, over 13239.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.000149, whisper_loss=0.09074, over 3861639.35 frames. ], batch size: 54, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:43:02,563 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 15:43:02,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3251220.0, ans=0.09899494936611666 2024-08-15 15:43:13,506 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-15 15:43:13,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3251320.0, ans=0.1 2024-08-15 15:43:20,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3251320.0, ans=0.125 2024-08-15 15:43:25,847 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 15:43:29,956 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 15:43:32,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3251420.0, ans=0.125 2024-08-15 15:43:41,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3251520.0, ans=0.2 2024-08-15 15:43:47,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3251520.0, ans=0.125 2024-08-15 15:44:04,142 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 15:44:06,878 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6350, loss[loss=0.07613, beats_loss=0.01173, ecapa_loss=0.0001517, whisper_loss=0.06288, over 13257.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001494, whisper_loss=0.09006, over 3857173.37 frames. ], batch size: 53, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:44:18,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3251720.0, ans=0.125 2024-08-15 15:44:22,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.294e+01 2.523e+01 2.815e+01 3.585e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 15:44:23,668 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 15:44:25,026 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 15:44:37,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-15 15:44:40,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3251920.0, ans=0.125 2024-08-15 15:44:47,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3251920.0, ans=0.2 2024-08-15 15:45:17,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-15 15:45:17,839 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6400, loss[loss=0.1015, beats_loss=0.00993, ecapa_loss=0.0001516, whisper_loss=0.0901, over 23168.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001497, whisper_loss=0.09071, over 3878241.89 frames. ], batch size: 93, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:45:28,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3252220.0, ans=0.2 2024-08-15 15:45:54,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3252420.0, ans=0.2 2024-08-15 15:45:59,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=12.0 2024-08-15 15:46:02,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3252520.0, ans=0.0 2024-08-15 15:46:19,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3252620.0, ans=0.1 2024-08-15 15:46:27,819 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6450, loss[loss=0.08921, beats_loss=0.009092, ecapa_loss=0.0001718, whisper_loss=0.0784, over 15925.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001498, whisper_loss=0.091, over 3873333.34 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:46:35,303 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 15:46:36,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3252720.0, ans=0.125 2024-08-15 15:46:42,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.348e+01 2.696e+01 2.907e+01 4.718e+01, threshold=5.393e+01, percent-clipped=0.0 2024-08-15 15:46:54,668 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 15:46:58,760 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 15:47:09,080 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 15:47:14,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3253020.0, ans=0.125 2024-08-15 15:47:22,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3253020.0, ans=0.0 2024-08-15 15:47:32,284 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 16 from LS+wenet, 13 from Vox, 50 fro AS 2024-08-15 15:47:40,865 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 15:47:41,874 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6500, loss[loss=0.08418, beats_loss=0.01181, ecapa_loss=0.0001165, whisper_loss=0.0712, over 20290.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001495, whisper_loss=0.09111, over 3875648.64 frames. ], batch size: 80, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:47:50,395 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 15:47:51,709 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-15 15:47:53,899 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-15 15:48:12,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3253420.0, ans=0.125 2024-08-15 15:48:27,787 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07086287438869476, model_norm_threshold=53.929649353027344 2024-08-15 15:48:27,957 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.590e+04, grad_sumsq=8.502e+06, orig_rms_sq=1.010e-02 2024-08-15 15:48:36,933 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 15:48:40,770 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 15:48:45,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3253620.0, ans=0.0 2024-08-15 15:48:52,317 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 15:48:53,517 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 15:48:56,036 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6550, loss[loss=0.1122, beats_loss=0.008577, ecapa_loss=0.0001631, whisper_loss=0.102, over 22313.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001493, whisper_loss=0.0907, over 3887721.09 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:49:03,149 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 25 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 15:49:09,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3253820.0, ans=0.125 2024-08-15 15:49:11,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.371e+01 2.638e+01 2.935e+01 7.610e+02, threshold=5.275e+01, percent-clipped=2.0 2024-08-15 15:49:21,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-15 15:49:36,728 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 15:49:52,976 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 15:50:00,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-08-15 15:50:07,736 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6600, loss[loss=0.1228, beats_loss=0.008317, ecapa_loss=0.0001723, whisper_loss=0.1128, over 22683.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001508, whisper_loss=0.09126, over 3930861.26 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:50:11,366 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 15:50:14,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3254220.0, ans=0.0 2024-08-15 15:50:30,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3254320.0, ans=0.0 2024-08-15 15:50:32,971 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 28 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 15:50:34,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3254320.0, ans=0.0 2024-08-15 15:50:40,056 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 15:50:45,320 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 15:50:49,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3254520.0, ans=0.05 2024-08-15 15:51:07,931 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 15:51:19,726 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6650, loss[loss=0.08042, beats_loss=0.01037, ecapa_loss=0.0001269, whisper_loss=0.06878, over 16475.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001499, whisper_loss=0.09119, over 3961758.05 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:51:24,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3254720.0, ans=0.125 2024-08-15 15:51:35,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.370e+01 2.592e+01 2.847e+01 4.238e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-15 15:51:54,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-15 15:52:03,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3255020.0, ans=0.0 2024-08-15 15:52:03,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3255020.0, ans=10.0 2024-08-15 15:52:09,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3255020.0, ans=0.125 2024-08-15 15:52:22,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3255120.0, ans=0.125 2024-08-15 15:52:33,173 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6700, loss[loss=0.07618, beats_loss=0.01146, ecapa_loss=0.0001591, whisper_loss=0.06313, over 17672.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.00015, whisper_loss=0.09137, over 3934246.35 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:52:37,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3255220.0, ans=0.0 2024-08-15 15:52:42,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3255220.0, ans=0.125 2024-08-15 15:52:52,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3255320.0, ans=0.0 2024-08-15 15:52:54,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3255320.0, ans=0.2 2024-08-15 15:53:00,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=12.0 2024-08-15 15:53:02,341 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 15:53:02,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3255420.0, ans=0.025 2024-08-15 15:53:08,287 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 15:53:13,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3255420.0, ans=0.0 2024-08-15 15:53:14,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-15 15:53:26,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3255520.0, ans=0.125 2024-08-15 15:53:40,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3255620.0, ans=0.125 2024-08-15 15:53:45,416 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6750, loss[loss=0.09361, beats_loss=0.01029, ecapa_loss=0.0001525, whisper_loss=0.08179, over 22898.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001491, whisper_loss=0.09164, over 3945153.64 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:54:01,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.293e+01 2.545e+01 2.878e+01 4.170e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 15:54:14,742 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:54:18,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3255920.0, ans=0.1 2024-08-15 15:54:45,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3256120.0, ans=0.0 2024-08-15 15:54:51,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-15 15:54:56,439 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6800, loss[loss=0.1013, beats_loss=0.01277, ecapa_loss=7.72e-05, whisper_loss=0.08771, over 15453.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01056, ecapa_loss=0.0001487, whisper_loss=0.09216, over 3944654.66 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:55:00,643 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 15:55:15,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3256320.0, ans=0.0 2024-08-15 15:55:22,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-08-15 15:55:32,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3256420.0, ans=0.09899494936611666 2024-08-15 15:55:38,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3256520.0, ans=0.0 2024-08-15 15:55:43,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3256520.0, ans=0.5 2024-08-15 15:55:47,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3256520.0, ans=10.0 2024-08-15 15:55:50,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3256520.0, ans=0.125 2024-08-15 15:55:55,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3256620.0, ans=0.0 2024-08-15 15:55:59,454 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 15:56:06,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6850, loss[loss=0.0978, beats_loss=0.01055, ecapa_loss=0.0001247, whisper_loss=0.086, over 14606.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.000149, whisper_loss=0.09133, over 3926508.87 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:56:08,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2024-08-15 15:56:12,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3256720.0, ans=0.2 2024-08-15 15:56:13,511 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 15:56:18,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3256720.0, ans=0.0 2024-08-15 15:56:21,269 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 15:56:22,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.268e+01 2.467e+01 2.871e+01 7.953e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-15 15:56:27,434 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 15:56:56,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3257020.0, ans=0.125 2024-08-15 15:57:16,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3257120.0, ans=0.035 2024-08-15 15:57:19,231 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 15:57:20,494 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6900, loss[loss=0.1059, beats_loss=0.01229, ecapa_loss=0.0001646, whisper_loss=0.09192, over 19253.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001491, whisper_loss=0.09085, over 3922083.49 frames. ], batch size: 77, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:57:32,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3257220.0, ans=0.125 2024-08-15 15:57:32,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3257220.0, ans=0.0 2024-08-15 15:57:35,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3257320.0, ans=0.09899494936611666 2024-08-15 15:57:37,340 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.24 vs. limit=22.5 2024-08-15 15:58:07,527 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 15:58:09,104 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 20 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-15 15:58:19,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3257620.0, ans=0.0 2024-08-15 15:58:27,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3257620.0, ans=0.0 2024-08-15 15:58:30,619 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:58:34,011 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 6950, loss[loss=0.08418, beats_loss=0.01521, ecapa_loss=0.0001232, whisper_loss=0.06773, over 23264.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01075, ecapa_loss=0.0001481, whisper_loss=0.0902, over 3896146.88 frames. ], batch size: 93, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:58:35,521 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 15:58:49,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.345e+01 2.623e+01 2.937e+01 1.105e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 15:59:18,082 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 17 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-15 15:59:24,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258020.0, ans=0.1 2024-08-15 15:59:28,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3258020.0, ans=0.125 2024-08-15 15:59:42,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3258120.0, ans=0.125 2024-08-15 15:59:44,404 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7000, loss[loss=0.106, beats_loss=0.01213, ecapa_loss=0.0001556, whisper_loss=0.09236, over 22131.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01075, ecapa_loss=0.0001506, whisper_loss=0.08938, over 3899849.40 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:59:54,634 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 16:00:10,931 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 16:00:32,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3258520.0, ans=0.2 2024-08-15 16:00:53,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7050, loss[loss=0.09413, beats_loss=0.009892, ecapa_loss=0.000144, whisper_loss=0.08279, over 17131.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001501, whisper_loss=0.08971, over 3898091.28 frames. ], batch size: 66, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:00:55,437 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 16:00:57,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3258720.0, ans=0.125 2024-08-15 16:01:05,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3258720.0, ans=0.1 2024-08-15 16:01:08,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.307e+01 2.519e+01 2.895e+01 2.053e+02, threshold=5.037e+01, percent-clipped=1.0 2024-08-15 16:01:11,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3258820.0, ans=0.125 2024-08-15 16:01:20,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3258920.0, ans=0.125 2024-08-15 16:01:22,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3258920.0, ans=0.125 2024-08-15 16:01:31,705 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 31 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 16:01:39,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3259020.0, ans=0.02 2024-08-15 16:01:51,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3259120.0, ans=0.125 2024-08-15 16:01:52,937 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 16:01:56,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3259120.0, ans=0.125 2024-08-15 16:02:01,888 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:02:04,200 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7100, loss[loss=0.1088, beats_loss=0.01179, ecapa_loss=0.0001422, whisper_loss=0.09563, over 18400.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001488, whisper_loss=0.09002, over 3859705.48 frames. ], batch size: 75, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:02:07,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-15 16:02:12,125 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-15 16:02:30,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3259320.0, ans=0.2 2024-08-15 16:02:36,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3259420.0, ans=0.0 2024-08-15 16:02:40,310 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 16:02:58,615 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 21 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 16:03:07,194 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 16:03:15,604 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7150, loss[loss=0.0982, beats_loss=0.01046, ecapa_loss=0.000175, whisper_loss=0.08598, over 20442.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001473, whisper_loss=0.08995, over 3869563.17 frames. ], batch size: 86, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:03:20,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3259720.0, ans=0.0 2024-08-15 16:03:31,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.305e+01 2.549e+01 2.852e+01 2.933e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-15 16:03:42,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3259820.0, ans=0.125 2024-08-15 16:04:08,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3260020.0, ans=0.125 2024-08-15 16:04:11,525 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 16:04:12,839 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 16:04:18,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3260120.0, ans=0.02 2024-08-15 16:04:26,733 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7200, loss[loss=0.1066, beats_loss=0.01056, ecapa_loss=0.0001597, whisper_loss=0.09442, over 22537.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01072, ecapa_loss=0.0001483, whisper_loss=0.08949, over 3876722.55 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:04:28,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-15 16:04:48,190 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 16:05:00,618 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 16:05:02,022 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 16:05:10,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=12.0 2024-08-15 16:05:10,848 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 16:05:29,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3260620.0, ans=0.1 2024-08-15 16:05:37,221 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7250, loss[loss=0.1169, beats_loss=0.008344, ecapa_loss=0.000166, whisper_loss=0.1069, over 23644.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001493, whisper_loss=0.09067, over 3881765.39 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:05:40,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3260720.0, ans=0.0 2024-08-15 16:05:44,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3260720.0, ans=0.0 2024-08-15 16:05:52,177 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.362e+01 2.587e+01 2.816e+01 1.917e+02, threshold=5.173e+01, percent-clipped=1.0 2024-08-15 16:05:54,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3260820.0, ans=0.05 2024-08-15 16:05:55,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-08-15 16:06:07,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3260920.0, ans=0.0 2024-08-15 16:06:12,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3260920.0, ans=0.0 2024-08-15 16:06:18,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-15 16:06:19,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3261020.0, ans=0.05 2024-08-15 16:06:26,427 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 16:06:26,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-15 16:06:41,802 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 16:06:46,845 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7300, loss[loss=0.107, beats_loss=0.01038, ecapa_loss=0.0001566, whisper_loss=0.0951, over 22824.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001491, whisper_loss=0.09018, over 3895494.08 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:06:54,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3261220.0, ans=0.95 2024-08-15 16:06:58,146 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 16:07:02,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3261320.0, ans=0.125 2024-08-15 16:07:05,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3261320.0, ans=0.0 2024-08-15 16:07:09,123 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 16:07:21,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2024-08-15 16:07:25,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3261420.0, ans=0.5 2024-08-15 16:07:26,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3261420.0, ans=0.2 2024-08-15 16:07:28,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3261520.0, ans=0.125 2024-08-15 16:07:31,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3261520.0, ans=0.125 2024-08-15 16:07:35,156 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 17 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 16:07:46,279 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 16:07:51,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3261620.0, ans=0.5 2024-08-15 16:07:54,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2024-08-15 16:07:57,231 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7350, loss[loss=0.1004, beats_loss=0.01225, ecapa_loss=0.0001294, whisper_loss=0.08689, over 21859.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01063, ecapa_loss=0.0001492, whisper_loss=0.0907, over 3906203.64 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:07:57,712 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 16:08:13,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.328e+01 2.533e+01 2.862e+01 3.908e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 16:08:22,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3261820.0, ans=0.125 2024-08-15 16:08:23,722 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 16 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-15 16:08:34,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3261920.0, ans=0.125 2024-08-15 16:08:46,208 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 16:08:47,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3262020.0, ans=0.125 2024-08-15 16:09:00,239 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 16 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 16:09:08,247 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7400, loss[loss=0.09304, beats_loss=0.01149, ecapa_loss=0.000155, whisper_loss=0.07999, over 16267.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.09032, over 3903790.89 frames. ], batch size: 65, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:09:29,637 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 16:09:46,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3262420.0, ans=0.2 2024-08-15 16:10:03,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=12.0 2024-08-15 16:10:05,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3262620.0, ans=0.125 2024-08-15 16:10:14,116 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 16:10:17,745 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7450, loss[loss=0.08648, beats_loss=0.01217, ecapa_loss=0.0001524, whisper_loss=0.07278, over 21520.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001502, whisper_loss=0.0901, over 3940700.25 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:10:20,684 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 16:10:32,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.336e+01 2.535e+01 2.838e+01 5.757e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-15 16:10:56,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-08-15 16:11:03,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3263020.0, ans=0.0 2024-08-15 16:11:12,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3263120.0, ans=0.125 2024-08-15 16:11:27,192 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7500, loss[loss=0.1051, beats_loss=0.01162, ecapa_loss=0.0001337, whisper_loss=0.09219, over 17122.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001507, whisper_loss=0.09035, over 3926825.94 frames. ], batch size: 69, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:11:37,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3263220.0, ans=0.5 2024-08-15 16:11:40,093 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 38 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 16:11:41,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3263320.0, ans=0.0 2024-08-15 16:11:52,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3263320.0, ans=0.0 2024-08-15 16:12:07,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3263420.0, ans=0.125 2024-08-15 16:12:13,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263520.0, ans=0.1 2024-08-15 16:12:21,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3263520.0, ans=0.125 2024-08-15 16:12:22,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3263620.0, ans=0.125 2024-08-15 16:12:23,730 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 31 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-15 16:12:23,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3263620.0, ans=0.125 2024-08-15 16:12:24,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.82 vs. limit=10.0 2024-08-15 16:12:26,281 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 16:12:37,236 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7550, loss[loss=0.0993, beats_loss=0.01166, ecapa_loss=0.0001479, whisper_loss=0.08616, over 22527.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001519, whisper_loss=0.09064, over 3904958.11 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:12:37,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2024-08-15 16:12:50,532 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 17 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-15 16:12:52,884 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.288e+01 2.542e+01 2.895e+01 9.119e+01, threshold=5.085e+01, percent-clipped=2.0 2024-08-15 16:13:07,142 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 16:13:26,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3264020.0, ans=0.1 2024-08-15 16:13:27,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3264020.0, ans=0.2 2024-08-15 16:13:43,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3264120.0, ans=0.0 2024-08-15 16:13:48,451 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7600, loss[loss=0.09326, beats_loss=0.01142, ecapa_loss=0.0001704, whisper_loss=0.08014, over 19866.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001507, whisper_loss=0.09004, over 3877683.02 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:14:10,847 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:14:20,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3264420.0, ans=0.0 2024-08-15 16:14:22,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3264420.0, ans=0.0 2024-08-15 16:14:27,852 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 16:14:58,923 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 16:15:00,128 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7650, loss[loss=0.09887, beats_loss=0.009669, ecapa_loss=0.000146, whisper_loss=0.08774, over 20998.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.09018, over 3910927.34 frames. ], batch size: 80, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:15:03,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3264720.0, ans=0.125 2024-08-15 16:15:15,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.326e+01 2.582e+01 2.912e+01 5.220e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 16:15:36,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264920.0, ans=0.1 2024-08-15 16:15:37,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-15 16:15:49,894 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 16:15:56,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3265120.0, ans=0.2 2024-08-15 16:15:57,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3265120.0, ans=0.125 2024-08-15 16:16:11,176 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7700, loss[loss=0.09056, beats_loss=0.01161, ecapa_loss=0.000159, whisper_loss=0.07736, over 17789.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01067, ecapa_loss=0.000151, whisper_loss=0.08907, over 3890907.86 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:16:16,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3265220.0, ans=0.0 2024-08-15 16:16:17,731 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 16:16:33,240 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 16:16:39,128 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 16:16:51,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3265420.0, ans=0.09899494936611666 2024-08-15 16:17:04,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3265520.0, ans=0.2 2024-08-15 16:17:21,054 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-15 16:17:24,485 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7750, loss[loss=0.1018, beats_loss=0.01135, ecapa_loss=0.0001271, whisper_loss=0.0892, over 22885.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001518, whisper_loss=0.08984, over 3885756.72 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:17:39,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265720.0, ans=0.1 2024-08-15 16:17:43,048 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 24 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-15 16:17:46,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.309e+01 2.587e+01 2.792e+01 3.462e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-15 16:18:01,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3265820.0, ans=0.125 2024-08-15 16:18:19,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3266020.0, ans=0.125 2024-08-15 16:18:19,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-15 16:18:22,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3266020.0, ans=0.125 2024-08-15 16:18:23,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3266020.0, ans=0.0 2024-08-15 16:18:29,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-15 16:18:30,228 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 23 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-15 16:18:34,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3266120.0, ans=0.125 2024-08-15 16:18:50,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3266220.0, ans=0.125 2024-08-15 16:18:50,806 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7800, loss[loss=0.08495, beats_loss=0.01052, ecapa_loss=0.0001514, whisper_loss=0.07291, over 20130.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001506, whisper_loss=0.09004, over 3901974.79 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:19:31,550 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 16:19:31,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3266420.0, ans=0.0 2024-08-15 16:19:58,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3266520.0, ans=0.09899494936611666 2024-08-15 16:20:10,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-08-15 16:20:31,305 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 16:20:33,328 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7850, loss[loss=0.08045, beats_loss=0.01138, ecapa_loss=0.0001861, whisper_loss=0.0672, over 20541.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001502, whisper_loss=0.09023, over 3906572.64 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:20:33,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3266720.0, ans=0.0 2024-08-15 16:20:41,658 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 17 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 16:20:56,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.312e+01 2.657e+01 2.999e+01 5.998e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-15 16:20:56,256 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 16:21:02,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3266820.0, ans=0.0 2024-08-15 16:21:16,816 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 16:21:22,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2024-08-15 16:21:24,236 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 16:21:37,471 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 14 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-15 16:21:47,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3267020.0, ans=0.09899494936611666 2024-08-15 16:22:03,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3267120.0, ans=0.0 2024-08-15 16:22:14,047 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 16:22:21,339 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7900, loss[loss=0.08086, beats_loss=0.01075, ecapa_loss=0.0001912, whisper_loss=0.06819, over 18876.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001494, whisper_loss=0.0903, over 3915116.73 frames. ], batch size: 79, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:22:23,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3267220.0, ans=0.0 2024-08-15 16:23:13,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3267420.0, ans=0.1 2024-08-15 16:23:14,311 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 16:23:25,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3267420.0, ans=0.05 2024-08-15 16:24:27,089 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 7950, loss[loss=0.09428, beats_loss=0.01143, ecapa_loss=0.0001763, whisper_loss=0.08109, over 23019.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001492, whisper_loss=0.09052, over 3928293.72 frames. ], batch size: 97, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:24:38,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3267720.0, ans=0.05 2024-08-15 16:24:43,533 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 37 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 16:24:51,868 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 16:24:53,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.363e+01 2.541e+01 2.931e+01 3.622e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-15 16:25:36,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=12.0 2024-08-15 16:25:44,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3268020.0, ans=0.0 2024-08-15 16:26:10,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-08-15 16:26:19,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3268120.0, ans=0.125 2024-08-15 16:26:33,017 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8000, loss[loss=0.1058, beats_loss=0.01056, ecapa_loss=0.0001336, whisper_loss=0.09388, over 22089.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001492, whisper_loss=0.09086, over 3908523.13 frames. ], batch size: 87, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:26:33,213 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 32 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 16:26:49,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3268220.0, ans=0.0 2024-08-15 16:27:31,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3268420.0, ans=0.125 2024-08-15 16:28:00,861 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 34 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 16:28:07,440 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 16:28:15,914 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8050, loss[loss=0.09046, beats_loss=0.01298, ecapa_loss=0.0001486, whisper_loss=0.07599, over 19728.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001483, whisper_loss=0.09076, over 3905979.23 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:28:25,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3268720.0, ans=0.0 2024-08-15 16:28:30,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3268820.0, ans=0.125 2024-08-15 16:28:30,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3268820.0, ans=0.125 2024-08-15 16:28:32,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.283e+01 2.526e+01 2.890e+01 4.835e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 16:28:53,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=12.0 2024-08-15 16:29:15,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3269020.0, ans=0.0 2024-08-15 16:29:21,327 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 16:29:35,050 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8100, loss[loss=0.08882, beats_loss=0.01441, ecapa_loss=0.0001254, whisper_loss=0.07316, over 15094.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.0905, over 3901226.33 frames. ], batch size: 61, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:29:50,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3269320.0, ans=0.125 2024-08-15 16:29:59,048 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:30:04,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3269320.0, ans=0.5 2024-08-15 16:30:05,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-08-15 16:30:15,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-08-15 16:30:39,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3269620.0, ans=0.09899494936611666 2024-08-15 16:30:48,691 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 16:30:56,638 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8150, loss[loss=0.07716, beats_loss=0.011, ecapa_loss=0.0001555, whisper_loss=0.0646, over 21580.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001486, whisper_loss=0.09062, over 3912863.12 frames. ], batch size: 86, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:31:05,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-15 16:31:15,493 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.201e+01 2.455e+01 2.771e+01 3.780e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-15 16:31:27,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3269820.0, ans=0.125 2024-08-15 16:31:48,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3270020.0, ans=0.1 2024-08-15 16:31:54,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3270020.0, ans=0.125 2024-08-15 16:32:02,486 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 16:32:16,908 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8200, loss[loss=0.08835, beats_loss=0.01155, ecapa_loss=0.0001147, whisper_loss=0.07565, over 19321.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001492, whisper_loss=0.0912, over 3897760.61 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:32:27,489 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 16:32:45,687 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 16:32:47,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3270420.0, ans=10.0 2024-08-15 16:33:25,939 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-15 16:33:34,019 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8250, loss[loss=0.1014, beats_loss=0.009419, ecapa_loss=0.0001298, whisper_loss=0.09064, over 18140.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.00015, whisper_loss=0.09159, over 3890954.46 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:33:35,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3270720.0, ans=0.2 2024-08-15 16:33:37,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3270720.0, ans=0.125 2024-08-15 16:33:49,681 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 16:33:50,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.400e+01 2.685e+01 3.048e+01 2.636e+02, threshold=5.369e+01, percent-clipped=3.0 2024-08-15 16:34:01,697 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 16:34:03,143 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 13 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-15 16:34:06,369 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:34:06,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3270920.0, ans=0.125 2024-08-15 16:34:31,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271020.0, ans=0.1 2024-08-15 16:34:31,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3271020.0, ans=0.125 2024-08-15 16:34:31,609 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-08-15 16:34:43,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3271120.0, ans=0.125 2024-08-15 16:34:44,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3271120.0, ans=0.125 2024-08-15 16:34:48,512 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8300, loss[loss=0.1023, beats_loss=0.01216, ecapa_loss=0.0001426, whisper_loss=0.08866, over 22544.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001481, whisper_loss=0.091, over 3918324.41 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:35:08,761 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 19 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 16:35:10,791 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-15 16:35:11,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3271320.0, ans=0.125 2024-08-15 16:35:14,486 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 41 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 16:35:40,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3271520.0, ans=0.025 2024-08-15 16:35:42,499 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.297e+00 2024-08-15 16:36:02,988 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8350, loss[loss=0.1076, beats_loss=0.011, ecapa_loss=0.0001131, whisper_loss=0.09551, over 23475.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001473, whisper_loss=0.09082, over 3922823.85 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:36:10,206 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 16:36:13,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3271720.0, ans=0.1 2024-08-15 16:36:18,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.348e+01 2.577e+01 2.897e+01 4.165e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 16:36:31,835 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 16:36:38,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3271920.0, ans=0.125 2024-08-15 16:36:45,286 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 16:37:10,040 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-15 16:37:11,510 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 16:37:17,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8400, loss[loss=0.1059, beats_loss=0.01223, ecapa_loss=0.0001138, whisper_loss=0.09253, over 23065.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.000149, whisper_loss=0.09099, over 3938358.89 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:37:37,296 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 20 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 16:38:15,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3272520.0, ans=0.0 2024-08-15 16:38:17,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3272620.0, ans=0.0 2024-08-15 16:38:18,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3272620.0, ans=0.05 2024-08-15 16:38:32,775 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8450, loss[loss=0.0958, beats_loss=0.01103, ecapa_loss=0.0001294, whisper_loss=0.08347, over 17261.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001487, whisper_loss=0.09044, over 3931699.66 frames. ], batch size: 64, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:38:41,170 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.074e+01 2024-08-15 16:38:42,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3272720.0, ans=0.125 2024-08-15 16:38:50,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.308e+01 2.508e+01 2.815e+01 5.021e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-15 16:38:56,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=12.0 2024-08-15 16:39:20,469 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 30 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-15 16:39:29,351 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 16:39:35,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3273120.0, ans=0.125 2024-08-15 16:39:36,877 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 16:39:47,655 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8500, loss[loss=0.07799, beats_loss=0.01553, ecapa_loss=0.0001187, whisper_loss=0.06127, over 21663.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001487, whisper_loss=0.09012, over 3903961.98 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:39:52,264 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 16:40:00,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3273220.0, ans=0.125 2024-08-15 16:40:04,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3273320.0, ans=0.125 2024-08-15 16:40:04,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3273320.0, ans=0.0 2024-08-15 16:40:08,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-15 16:40:14,393 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 27 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 16:40:19,106 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 16:40:23,894 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 23 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-15 16:40:24,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273420.0, ans=0.1 2024-08-15 16:40:48,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 16:41:05,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8550, loss[loss=0.107, beats_loss=0.009947, ecapa_loss=0.0001137, whisper_loss=0.09592, over 16334.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.000148, whisper_loss=0.09001, over 3897360.29 frames. ], batch size: 60, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:41:13,161 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 16:41:13,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3273720.0, ans=0.125 2024-08-15 16:41:23,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.398e+01 2.637e+01 2.998e+01 4.357e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-15 16:41:36,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3273920.0, ans=0.0 2024-08-15 16:41:52,609 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-15 16:41:53,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3274020.0, ans=15.0 2024-08-15 16:42:15,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3274120.0, ans=0.04949747468305833 2024-08-15 16:42:16,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2024-08-15 16:42:21,698 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8600, loss[loss=0.1087, beats_loss=0.01141, ecapa_loss=0.0001147, whisper_loss=0.09611, over 18256.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001483, whisper_loss=0.09057, over 3876019.60 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:42:28,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3274220.0, ans=0.0 2024-08-15 16:42:41,883 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 16:42:55,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3274420.0, ans=0.125 2024-08-15 16:43:13,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3274520.0, ans=0.125 2024-08-15 16:43:13,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-15 16:43:20,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3274520.0, ans=0.0 2024-08-15 16:43:28,877 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-15 16:43:30,710 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 16:43:36,836 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 16:43:37,996 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8650, loss[loss=0.1055, beats_loss=0.00933, ecapa_loss=0.0001424, whisper_loss=0.0947, over 13647.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001484, whisper_loss=0.09008, over 3884403.06 frames. ], batch size: 54, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:43:43,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3274720.0, ans=0.125 2024-08-15 16:43:47,128 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-15 16:43:55,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-15 16:43:55,932 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.305e+01 2.531e+01 2.832e+01 4.112e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:44:12,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3274920.0, ans=0.0 2024-08-15 16:44:29,378 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 16:44:34,375 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.053e-02 2024-08-15 16:44:48,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3275120.0, ans=0.0 2024-08-15 16:44:52,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275220.0, ans=0.1 2024-08-15 16:44:53,544 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8700, loss[loss=0.1145, beats_loss=0.008221, ecapa_loss=0.0001401, whisper_loss=0.1048, over 14835.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001476, whisper_loss=0.09035, over 3877123.32 frames. ], batch size: 54, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:44:55,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3275220.0, ans=0.125 2024-08-15 16:45:03,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3275220.0, ans=0.95 2024-08-15 16:45:20,300 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 16:45:20,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3275320.0, ans=0.125 2024-08-15 16:45:42,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3275520.0, ans=0.0 2024-08-15 16:45:50,905 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 30 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-15 16:45:52,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3275520.0, ans=0.5 2024-08-15 16:45:59,108 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 16:46:11,309 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8750, loss[loss=0.115, beats_loss=0.008125, ecapa_loss=0.0001564, whisper_loss=0.1053, over 20519.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001489, whisper_loss=0.09073, over 3871192.08 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:46:14,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-15 16:46:29,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.332e+01 2.573e+01 2.934e+01 5.671e+01, threshold=5.146e+01, percent-clipped=2.0 2024-08-15 16:47:01,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3275920.0, ans=0.125 2024-08-15 16:47:06,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3276020.0, ans=0.125 2024-08-15 16:47:12,715 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-15 16:47:13,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3276020.0, ans=0.2 2024-08-15 16:47:28,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3276120.0, ans=0.125 2024-08-15 16:47:37,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2024-08-15 16:47:39,052 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8800, loss[loss=0.1096, beats_loss=0.009522, ecapa_loss=0.0001439, whisper_loss=0.09868, over 23915.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001489, whisper_loss=0.09139, over 3884056.70 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:47:39,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3276220.0, ans=0.0 2024-08-15 16:47:45,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-08-15 16:48:08,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3276320.0, ans=0.125 2024-08-15 16:48:08,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3276320.0, ans=0.0 2024-08-15 16:48:11,768 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 16:48:19,991 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 19 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 16:48:44,450 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 16:49:07,198 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 27 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 16:49:12,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3276620.0, ans=0.125 2024-08-15 16:49:15,403 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8850, loss[loss=0.07422, beats_loss=0.01364, ecapa_loss=0.000122, whisper_loss=0.05936, over 22126.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001492, whisper_loss=0.0904, over 3878251.85 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:49:36,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.293e+01 2.672e+01 3.024e+01 1.700e+02, threshold=5.345e+01, percent-clipped=3.0 2024-08-15 16:49:37,249 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 16:49:55,538 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 16:50:10,907 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 16:50:15,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3277020.0, ans=0.025 2024-08-15 16:50:17,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.89 vs. limit=10.0 2024-08-15 16:50:22,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3277120.0, ans=0.0 2024-08-15 16:50:35,610 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8900, loss[loss=0.1123, beats_loss=0.0099, ecapa_loss=0.0001542, whisper_loss=0.1008, over 17898.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001485, whisper_loss=0.09012, over 3856250.86 frames. ], batch size: 71, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:50:38,726 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 17 from LS+wenet, 9 from Vox, 39 fro AS 2024-08-15 16:50:42,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2024-08-15 16:50:46,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3277220.0, ans=0.1 2024-08-15 16:50:47,778 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 16:50:54,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3277320.0, ans=0.125 2024-08-15 16:51:27,717 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 16:51:36,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3277620.0, ans=0.0 2024-08-15 16:51:36,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-15 16:51:49,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 8950, loss[loss=0.09231, beats_loss=0.0104, ecapa_loss=0.0001903, whisper_loss=0.08001, over 21176.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001482, whisper_loss=0.0902, over 3844023.09 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:52:04,468 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 33 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 16:52:06,969 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.302e+01 2.531e+01 2.768e+01 4.662e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:52:53,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3278120.0, ans=0.125 2024-08-15 16:52:54,911 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 16:53:01,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3278220.0, ans=0.0 2024-08-15 16:53:01,964 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9000, loss[loss=0.09406, beats_loss=0.01343, ecapa_loss=0.0001211, whisper_loss=0.07942, over 21887.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001484, whisper_loss=0.08997, over 3871617.72 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:53:01,964 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 16:53:39,084 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2514, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2461, over 922467.00 frames. 2024-08-15 16:53:57,667 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004212, beats_loss=0, ecapa_loss=0.0004212, whisper_loss=0, over 939242.00 frames. 2024-08-15 16:55:49,298 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 16:55:49,302 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 16:55:52,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3278220.0, ans=0.125 2024-08-15 16:56:07,554 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 16:56:22,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3278420.0, ans=0.125 2024-08-15 16:56:31,045 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 16:56:43,996 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 16:57:03,770 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9050, loss[loss=0.1206, beats_loss=0.009038, ecapa_loss=0.0001431, whisper_loss=0.1102, over 24209.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001478, whisper_loss=0.09012, over 3874666.99 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:57:07,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3278720.0, ans=0.125 2024-08-15 16:57:21,134 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.395e+01 2.682e+01 2.921e+01 1.898e+02, threshold=5.364e+01, percent-clipped=1.0 2024-08-15 16:57:41,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3278920.0, ans=0.125 2024-08-15 16:57:45,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3278920.0, ans=0.125 2024-08-15 16:57:50,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-15 16:57:53,006 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 24 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-15 16:57:54,559 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 31 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 16:58:06,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3279120.0, ans=0.2 2024-08-15 16:58:17,930 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9100, loss[loss=0.08198, beats_loss=0.01284, ecapa_loss=0.0001782, whisper_loss=0.06736, over 19142.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001491, whisper_loss=0.09055, over 3867821.33 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:58:18,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3279220.0, ans=0.125 2024-08-15 16:58:25,203 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 30 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-15 16:58:25,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-15 16:58:29,756 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:58:39,610 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 16:58:52,886 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 16:58:54,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-15 16:58:57,371 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 16:59:01,438 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 16:59:06,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3279520.0, ans=0.0 2024-08-15 16:59:10,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3279520.0, ans=0.125 2024-08-15 16:59:22,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3279620.0, ans=0.0 2024-08-15 16:59:26,727 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 20 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-15 16:59:30,366 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9150, loss[loss=0.0776, beats_loss=0.01368, ecapa_loss=0.000143, whisper_loss=0.0625, over 18142.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001494, whisper_loss=0.0907, over 3921816.31 frames. ], batch size: 75, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:59:30,659 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 29 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 16:59:34,812 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 16:59:40,492 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 38 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:59:43,409 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 16:59:43,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3279820.0, ans=0.125 2024-08-15 16:59:45,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3279820.0, ans=0.0 2024-08-15 16:59:47,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.265e+01 2.493e+01 2.729e+01 3.385e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 17:00:01,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3279920.0, ans=0.0 2024-08-15 17:00:02,478 INFO [train_multi_KD3.py:844] (2/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 17:00:18,583 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 17:00:19,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=12.0 2024-08-15 17:00:28,769 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 17:00:29,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3280020.0, ans=0.95 2024-08-15 17:00:35,954 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 17:00:42,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3280120.0, ans=0.125 2024-08-15 17:00:45,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3280220.0, ans=0.0 2024-08-15 17:00:46,230 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9200, loss[loss=0.1069, beats_loss=0.01123, ecapa_loss=0.0001576, whisper_loss=0.09408, over 19158.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.09101, over 3912454.94 frames. ], batch size: 79, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:00:53,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2024-08-15 17:00:57,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3280220.0, ans=0.0 2024-08-15 17:01:03,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3280320.0, ans=0.125 2024-08-15 17:01:15,023 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 39 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 17:01:33,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3280520.0, ans=0.125 2024-08-15 17:01:38,677 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 17:01:43,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2024-08-15 17:01:49,471 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 17:01:49,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3280620.0, ans=0.04949747468305833 2024-08-15 17:01:49,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3280620.0, ans=0.125 2024-08-15 17:02:00,562 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9250, loss[loss=0.09949, beats_loss=0.01212, ecapa_loss=0.0001137, whisper_loss=0.08623, over 22665.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.000148, whisper_loss=0.09089, over 3928954.62 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:02:01,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3280720.0, ans=0.125 2024-08-15 17:02:17,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.360e+01 2.606e+01 2.888e+01 4.280e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 17:02:23,116 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 17:02:26,242 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 17:02:28,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3280820.0, ans=0.2 2024-08-15 17:02:48,350 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 17:03:00,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3281120.0, ans=0.1 2024-08-15 17:03:04,941 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 17:03:12,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3281120.0, ans=10.0 2024-08-15 17:03:14,947 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9300, loss[loss=0.1073, beats_loss=0.01029, ecapa_loss=0.0001422, whisper_loss=0.09557, over 16081.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001479, whisper_loss=0.09046, over 3932651.67 frames. ], batch size: 62, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:03:17,969 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 17:03:27,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-15 17:03:27,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2024-08-15 17:03:35,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3281320.0, ans=0.0 2024-08-15 17:03:42,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3281320.0, ans=0.0 2024-08-15 17:03:54,670 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 17:04:02,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281520.0, ans=0.1 2024-08-15 17:04:32,692 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9350, loss[loss=0.1023, beats_loss=0.009864, ecapa_loss=0.0001441, whisper_loss=0.09098, over 15483.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001481, whisper_loss=0.09067, over 3878534.68 frames. ], batch size: 61, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:04:40,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3281720.0, ans=0.0 2024-08-15 17:04:51,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.526e+01 2.881e+01 4.072e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-15 17:04:53,164 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 17:05:09,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3281920.0, ans=0.125 2024-08-15 17:05:14,290 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 17:05:36,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3282120.0, ans=0.125 2024-08-15 17:05:37,500 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 17:05:48,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3282220.0, ans=0.0 2024-08-15 17:05:49,186 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9400, loss[loss=0.1403, beats_loss=0.00924, ecapa_loss=0.0001357, whisper_loss=0.1297, over 19266.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001493, whisper_loss=0.09012, over 3862200.44 frames. ], batch size: 71, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:05:57,418 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 17:06:12,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-15 17:06:12,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=12.0 2024-08-15 17:06:16,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3282320.0, ans=0.125 2024-08-15 17:06:17,796 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 17:06:21,071 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 17:06:25,756 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 17:06:31,660 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 17:07:08,895 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9450, loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001296, whisper_loss=0.09202, over 21480.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09037, over 3878923.60 frames. ], batch size: 84, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:07:23,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3282820.0, ans=0.0 2024-08-15 17:07:27,635 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.277e+01 2.590e+01 2.788e+01 7.153e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-15 17:07:29,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3282820.0, ans=0.0 2024-08-15 17:07:44,428 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 17:07:49,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3282920.0, ans=0.0 2024-08-15 17:08:07,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=12.0 2024-08-15 17:08:13,685 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 17:08:25,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3283220.0, ans=0.2 2024-08-15 17:08:26,374 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9500, loss[loss=0.1004, beats_loss=0.01183, ecapa_loss=0.0001433, whisper_loss=0.08711, over 23286.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.09069, over 3872074.35 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:08:27,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3283220.0, ans=0.2 2024-08-15 17:08:27,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3283220.0, ans=0.125 2024-08-15 17:08:35,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3283220.0, ans=0.0 2024-08-15 17:08:51,508 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 17:09:04,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-15 17:09:18,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3283520.0, ans=0.0 2024-08-15 17:09:26,284 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 17:09:27,724 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 17:09:30,295 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 17:09:36,204 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 17:09:40,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9550, loss[loss=0.09313, beats_loss=0.01077, ecapa_loss=0.0001272, whisper_loss=0.08109, over 14730.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001495, whisper_loss=0.09013, over 3861971.86 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:09:56,767 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 16 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 17:09:57,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.381e+01 2.631e+01 2.929e+01 4.005e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-15 17:10:33,337 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 17:10:41,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3284120.0, ans=0.2 2024-08-15 17:10:41,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-15 17:10:44,742 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-15 17:10:50,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3284120.0, ans=0.125 2024-08-15 17:10:54,225 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9600, loss[loss=0.09664, beats_loss=0.01054, ecapa_loss=0.0001665, whisper_loss=0.08444, over 13630.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001493, whisper_loss=0.0898, over 3839565.17 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:11:06,718 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:11:34,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3284420.0, ans=0.07 2024-08-15 17:11:36,009 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 17:11:39,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3284520.0, ans=0.125 2024-08-15 17:11:44,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-15 17:12:08,228 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9650, loss[loss=0.1087, beats_loss=0.01136, ecapa_loss=0.0001537, whisper_loss=0.09579, over 20868.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001506, whisper_loss=0.09013, over 3864936.39 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:12:11,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284720.0, ans=0.1 2024-08-15 17:12:25,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.315e+01 2.525e+01 2.831e+01 4.515e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-15 17:12:29,037 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 17:12:30,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3284820.0, ans=0.125 2024-08-15 17:12:33,858 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:12:50,895 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 17:13:05,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3285020.0, ans=0.1 2024-08-15 17:13:08,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3285120.0, ans=0.2 2024-08-15 17:13:21,426 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9700, loss[loss=0.09627, beats_loss=0.00996, ecapa_loss=0.0002026, whisper_loss=0.08428, over 20943.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001522, whisper_loss=0.09006, over 3868463.82 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:13:29,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3285220.0, ans=0.1 2024-08-15 17:14:08,109 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 21 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 17:14:08,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3285320.0, ans=0.0 2024-08-15 17:14:14,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3285320.0, ans=0.125 2024-08-15 17:14:37,750 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 17:14:50,006 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 27 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 17:14:57,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3285520.0, ans=0.125 2024-08-15 17:14:58,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3285520.0, ans=0.125 2024-08-15 17:14:59,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3285520.0, ans=0.2 2024-08-15 17:15:04,067 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 17:15:08,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3285620.0, ans=0.0 2024-08-15 17:15:17,250 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 29 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 17:15:19,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.19 vs. limit=6.0 2024-08-15 17:15:19,949 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9750, loss[loss=0.0968, beats_loss=0.008653, ecapa_loss=0.0001701, whisper_loss=0.08644, over 15924.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.000151, whisper_loss=0.09044, over 3867575.10 frames. ], batch size: 65, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:15:23,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3285720.0, ans=0.125 2024-08-15 17:15:40,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.300e+01 2.530e+01 2.812e+01 4.314e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 17:15:43,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3285820.0, ans=0.0 2024-08-15 17:15:57,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3285920.0, ans=0.0 2024-08-15 17:16:03,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3285920.0, ans=0.07 2024-08-15 17:16:33,034 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 17:16:40,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3286120.0, ans=0.0 2024-08-15 17:16:53,733 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-15 17:17:02,576 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9800, loss[loss=0.1052, beats_loss=0.009903, ecapa_loss=0.0001627, whisper_loss=0.09365, over 20837.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001514, whisper_loss=0.09063, over 3866988.32 frames. ], batch size: 82, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:17:02,794 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 17:17:16,184 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 17:17:22,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2024-08-15 17:17:37,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-15 17:17:42,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3286320.0, ans=0.2 2024-08-15 17:17:42,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3286320.0, ans=0.0 2024-08-15 17:17:52,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3286420.0, ans=0.0 2024-08-15 17:18:04,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3286520.0, ans=0.1 2024-08-15 17:18:22,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3286520.0, ans=0.0 2024-08-15 17:18:36,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3286620.0, ans=0.125 2024-08-15 17:18:44,327 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 17:18:53,631 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9850, loss[loss=0.1089, beats_loss=0.01002, ecapa_loss=0.000148, whisper_loss=0.09739, over 22382.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001502, whisper_loss=0.09088, over 3862082.66 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:19:08,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3286720.0, ans=0.0 2024-08-15 17:19:13,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-08-15 17:19:24,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.372e+01 2.632e+01 2.935e+01 4.456e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-15 17:19:30,104 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 17:20:12,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-08-15 17:20:15,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3287020.0, ans=0.125 2024-08-15 17:20:17,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-08-15 17:20:36,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3287120.0, ans=0.125 2024-08-15 17:20:43,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3287120.0, ans=0.025 2024-08-15 17:20:48,727 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 17:20:58,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-15 17:20:58,537 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9900, loss[loss=0.1083, beats_loss=0.009573, ecapa_loss=0.0001491, whisper_loss=0.09727, over 21206.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001504, whisper_loss=0.09075, over 3853454.10 frames. ], batch size: 84, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:21:45,130 INFO [train_multi_KD3.py:844] (2/4) A total of 58 cuts. 23 from LS+wenet, 7 from Vox, 28 fro AS 2024-08-15 17:21:46,995 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 17:22:26,110 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 17:22:35,982 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 17:22:56,809 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 17:23:01,455 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 9950, loss[loss=0.1054, beats_loss=0.01092, ecapa_loss=0.0001454, whisper_loss=0.09304, over 22560.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001498, whisper_loss=0.0906, over 3864402.81 frames. ], batch size: 93, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:23:11,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3287720.0, ans=0.0 2024-08-15 17:23:13,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3287720.0, ans=0.125 2024-08-15 17:23:26,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3287820.0, ans=0.1 2024-08-15 17:23:28,831 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 17:23:31,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.466e+01 2.723e+01 3.016e+01 5.091e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-15 17:23:32,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.56 vs. limit=10.0 2024-08-15 17:24:14,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3288020.0, ans=0.0 2024-08-15 17:24:14,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3288020.0, ans=0.0 2024-08-15 17:24:21,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3288020.0, ans=0.0 2024-08-15 17:24:35,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3288120.0, ans=0.0 2024-08-15 17:24:50,082 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10000, loss[loss=0.1029, beats_loss=0.009193, ecapa_loss=0.0001846, whisper_loss=0.09183, over 13165.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001505, whisper_loss=0.09073, over 3808647.05 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:24:50,322 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 17:24:55,751 INFO [train_multi_KD3.py:844] (2/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 17:25:03,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3288220.0, ans=0.125 2024-08-15 17:25:13,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-15 17:25:37,373 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 17:25:37,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3288420.0, ans=0.2 2024-08-15 17:25:57,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3288520.0, ans=0.0 2024-08-15 17:26:14,832 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 17:26:18,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10050, loss[loss=0.1065, beats_loss=0.009978, ecapa_loss=0.000166, whisper_loss=0.09484, over 19025.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001501, whisper_loss=0.08991, over 3804296.05 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:26:28,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3288720.0, ans=0.0 2024-08-15 17:26:37,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3288820.0, ans=0.0 2024-08-15 17:26:40,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.323e+01 2.519e+01 2.738e+01 4.374e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-15 17:26:55,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3288920.0, ans=0.04949747468305833 2024-08-15 17:26:57,144 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 17:26:57,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-08-15 17:27:01,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3288920.0, ans=0.1 2024-08-15 17:27:16,558 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:27:22,334 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-15 17:27:25,579 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 17 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 17:27:43,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3289120.0, ans=0.0 2024-08-15 17:27:47,832 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10100, loss[loss=0.1079, beats_loss=0.01284, ecapa_loss=0.0001434, whisper_loss=0.09361, over 20119.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001493, whisper_loss=0.09036, over 3863549.85 frames. ], batch size: 82, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:27:49,551 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 17:27:55,941 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.157e-03 2024-08-15 17:28:14,455 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 17:28:19,534 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 17:28:21,419 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 17:29:03,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3289620.0, ans=0.125 2024-08-15 17:29:18,233 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10150, loss[loss=0.07899, beats_loss=0.0123, ecapa_loss=0.0001316, whisper_loss=0.06537, over 15364.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001502, whisper_loss=0.09047, over 3880559.72 frames. ], batch size: 61, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:29:34,630 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 17:29:37,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-08-15 17:29:38,201 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 29 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 17:29:39,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.314e+01 2.537e+01 2.890e+01 1.648e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-15 17:29:39,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3289820.0, ans=0.125 2024-08-15 17:29:45,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3289820.0, ans=0.125 2024-08-15 17:29:54,378 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 17:30:01,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3289920.0, ans=0.125 2024-08-15 17:30:18,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3290020.0, ans=0.125 2024-08-15 17:30:22,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3290020.0, ans=0.125 2024-08-15 17:30:24,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3290120.0, ans=0.1 2024-08-15 17:30:41,116 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10200, loss[loss=0.09343, beats_loss=0.01011, ecapa_loss=0.0001534, whisper_loss=0.08178, over 21654.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01067, ecapa_loss=0.0001502, whisper_loss=0.08977, over 3881316.01 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:30:56,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3290320.0, ans=0.125 2024-08-15 17:31:21,387 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 17:31:36,233 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 14 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 17:31:51,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3290620.0, ans=0.125 2024-08-15 17:32:02,509 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 17:32:05,219 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10250, loss[loss=0.09467, beats_loss=0.01055, ecapa_loss=0.0001431, whisper_loss=0.08269, over 20887.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001507, whisper_loss=0.09021, over 3872311.75 frames. ], batch size: 84, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:32:25,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.349e+01 2.551e+01 2.894e+01 3.006e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-15 17:32:32,210 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 17:32:32,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3290820.0, ans=0.125 2024-08-15 17:32:39,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3290920.0, ans=0.125 2024-08-15 17:32:46,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3290920.0, ans=0.125 2024-08-15 17:33:16,886 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 17:33:26,881 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10300, loss[loss=0.08248, beats_loss=0.01023, ecapa_loss=0.0001603, whisper_loss=0.07065, over 17499.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.00015, whisper_loss=0.09083, over 3896663.79 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:33:29,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.11 vs. limit=22.5 2024-08-15 17:33:37,517 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 22 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-15 17:34:15,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3291520.0, ans=0.125 2024-08-15 17:34:29,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3291620.0, ans=0.125 2024-08-15 17:34:35,146 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 17:34:43,149 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10350, loss[loss=0.109, beats_loss=0.01123, ecapa_loss=0.0001701, whisper_loss=0.09611, over 22275.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001506, whisper_loss=0.0909, over 3898554.21 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:34:51,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3291720.0, ans=0.125 2024-08-15 17:34:53,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3291720.0, ans=0.125 2024-08-15 17:34:59,463 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 17:35:02,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.326e+01 2.650e+01 2.958e+01 2.904e+02, threshold=5.299e+01, percent-clipped=2.0 2024-08-15 17:35:19,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-15 17:35:21,729 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 27 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 17:35:25,728 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 17:35:32,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3292020.0, ans=0.125 2024-08-15 17:35:33,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3292020.0, ans=0.125 2024-08-15 17:35:49,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3292120.0, ans=0.1 2024-08-15 17:35:53,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3292120.0, ans=0.09899494936611666 2024-08-15 17:36:00,813 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10400, loss[loss=0.07329, beats_loss=0.0119, ecapa_loss=0.0001506, whisper_loss=0.05989, over 18454.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001497, whisper_loss=0.09098, over 3876318.25 frames. ], batch size: 76, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:36:09,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3292220.0, ans=0.1 2024-08-15 17:36:20,535 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 17:36:23,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3292320.0, ans=0.0 2024-08-15 17:36:26,414 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 17:36:36,392 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 23 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-15 17:36:42,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3292420.0, ans=0.125 2024-08-15 17:37:00,011 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 17:37:10,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3292620.0, ans=0.1 2024-08-15 17:37:14,068 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10450, loss[loss=0.1067, beats_loss=0.008677, ecapa_loss=0.0002187, whisper_loss=0.0958, over 16133.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001493, whisper_loss=0.09004, over 3832108.01 frames. ], batch size: 68, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:37:19,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3292720.0, ans=0.125 2024-08-15 17:37:31,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.170e+01 2.524e+01 2.860e+01 1.816e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-15 17:37:34,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3292820.0, ans=0.1 2024-08-15 17:37:40,829 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 17:37:44,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3292920.0, ans=0.1 2024-08-15 17:37:47,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2024-08-15 17:37:48,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-15 17:37:49,458 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 17:37:51,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3292920.0, ans=0.04949747468305833 2024-08-15 17:38:02,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3293020.0, ans=0.125 2024-08-15 17:38:05,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-15 17:38:14,978 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 17:38:16,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2024-08-15 17:38:28,090 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10500, loss[loss=0.1121, beats_loss=0.0084, ecapa_loss=0.0001499, whisper_loss=0.1022, over 15422.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.0001506, whisper_loss=0.08955, over 3816474.98 frames. ], batch size: 59, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:38:34,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3293220.0, ans=0.0 2024-08-15 17:38:38,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3293220.0, ans=0.125 2024-08-15 17:38:42,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3293320.0, ans=0.2 2024-08-15 17:39:03,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3293420.0, ans=0.125 2024-08-15 17:39:09,318 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 17:39:17,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3293520.0, ans=0.125 2024-08-15 17:39:42,552 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10550, loss[loss=0.1013, beats_loss=0.01254, ecapa_loss=0.0001121, whisper_loss=0.08763, over 24395.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001508, whisper_loss=0.08944, over 3848356.33 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:39:51,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3293720.0, ans=0.0 2024-08-15 17:40:01,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.311e+01 2.619e+01 2.877e+01 4.261e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-15 17:40:04,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293820.0, ans=0.1 2024-08-15 17:40:14,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3293920.0, ans=0.05 2024-08-15 17:40:54,212 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10600, loss[loss=0.1048, beats_loss=0.009284, ecapa_loss=0.0001317, whisper_loss=0.09424, over 22437.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001501, whisper_loss=0.09012, over 3879465.87 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:40:56,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-15 17:40:57,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3294220.0, ans=0.125 2024-08-15 17:41:10,323 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 17:41:13,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3294320.0, ans=0.5 2024-08-15 17:41:36,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3294520.0, ans=0.125 2024-08-15 17:41:37,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-15 17:41:46,297 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 17:41:47,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3294520.0, ans=0.2 2024-08-15 17:42:00,666 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 17:42:06,129 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10650, loss[loss=0.116, beats_loss=0.009768, ecapa_loss=0.0001666, whisper_loss=0.1046, over 22038.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001488, whisper_loss=0.09051, over 3862697.40 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:42:24,884 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.386e+01 2.620e+01 3.005e+01 5.015e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 17:42:45,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3294920.0, ans=0.125 2024-08-15 17:42:51,138 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 17:42:52,570 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 17:43:01,593 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:43:05,977 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 17:43:19,803 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10700, loss[loss=0.1177, beats_loss=0.009705, ecapa_loss=0.0001706, whisper_loss=0.1063, over 16424.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001472, whisper_loss=0.09077, over 3869284.92 frames. ], batch size: 65, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:43:24,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3295220.0, ans=0.0 2024-08-15 17:43:30,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3295220.0, ans=0.07 2024-08-15 17:44:01,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3295420.0, ans=0.0 2024-08-15 17:44:09,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-08-15 17:44:11,368 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 17:44:25,440 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 17:44:32,586 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10750, loss[loss=0.1095, beats_loss=0.01033, ecapa_loss=0.0001314, whisper_loss=0.09783, over 22492.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001474, whisper_loss=0.0908, over 3875486.60 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:44:50,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.313e+01 2.649e+01 2.924e+01 4.383e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 17:44:51,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3295820.0, ans=0.0 2024-08-15 17:45:04,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3295920.0, ans=0.0 2024-08-15 17:45:44,509 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10800, loss[loss=0.09035, beats_loss=0.01153, ecapa_loss=0.0001493, whisper_loss=0.07733, over 14247.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001466, whisper_loss=0.09099, over 3879929.55 frames. ], batch size: 60, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:45:47,519 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 17:45:59,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3296320.0, ans=0.125 2024-08-15 17:46:00,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.48 vs. limit=22.5 2024-08-15 17:46:16,290 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 17:46:19,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=15.0 2024-08-15 17:46:22,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3296420.0, ans=0.1 2024-08-15 17:46:33,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3296520.0, ans=0.0 2024-08-15 17:46:42,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3296620.0, ans=0.1 2024-08-15 17:46:51,728 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 17:46:54,316 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04891674965620041, model_norm_threshold=52.98820877075195 2024-08-15 17:46:54,493 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.961e+05, grad_sumsq=3.961e+05, orig_rms_sq=1.000e+00 2024-08-15 17:46:55,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3296720.0, ans=0.07 2024-08-15 17:46:55,797 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10850, loss[loss=0.1118, beats_loss=0.01022, ecapa_loss=0.0002028, whisper_loss=0.09954, over 21776.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001474, whisper_loss=0.09117, over 3893182.31 frames. ], batch size: 93, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:47:00,803 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2024-08-15 17:47:11,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3296820.0, ans=22.5 2024-08-15 17:47:13,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.310e+01 2.561e+01 2.919e+01 1.083e+03, threshold=5.121e+01, percent-clipped=1.0 2024-08-15 17:47:37,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-15 17:47:41,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-15 17:47:43,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3297020.0, ans=0.1 2024-08-15 17:48:03,729 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-15 17:48:04,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3297120.0, ans=0.125 2024-08-15 17:48:08,166 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10900, loss[loss=0.1203, beats_loss=0.007939, ecapa_loss=0.0001574, whisper_loss=0.1108, over 22972.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.09218, over 3938908.16 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:48:26,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-15 17:48:33,172 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-15 17:48:34,746 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 21 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 17:48:44,734 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 17:48:46,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3297420.0, ans=0.125 2024-08-15 17:48:49,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3297420.0, ans=0.125 2024-08-15 17:48:52,015 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 17:49:09,531 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 20 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 17:49:20,707 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 10950, loss[loss=0.1161, beats_loss=0.009251, ecapa_loss=0.0001378, whisper_loss=0.1054, over 23859.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001479, whisper_loss=0.09156, over 3955688.16 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:49:40,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.372e+01 2.632e+01 3.011e+01 4.357e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-15 17:49:40,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3297820.0, ans=0.125 2024-08-15 17:49:45,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2024-08-15 17:49:49,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3297920.0, ans=0.125 2024-08-15 17:49:53,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3297920.0, ans=0.125 2024-08-15 17:49:54,706 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 17:49:57,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3297920.0, ans=0.04949747468305833 2024-08-15 17:49:57,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3297920.0, ans=0.125 2024-08-15 17:50:05,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3298020.0, ans=0.0 2024-08-15 17:50:06,233 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 27 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-15 17:50:13,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3298020.0, ans=0.125 2024-08-15 17:50:24,644 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 17:50:31,769 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11000, loss[loss=0.1145, beats_loss=0.01062, ecapa_loss=0.0001762, whisper_loss=0.1021, over 17592.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01053, ecapa_loss=0.000148, whisper_loss=0.09187, over 3954061.77 frames. ], batch size: 73, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:50:40,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2024-08-15 17:50:49,078 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 17:50:59,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3298420.0, ans=0.125 2024-08-15 17:51:00,447 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 17:51:01,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3298420.0, ans=0.125 2024-08-15 17:51:03,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3298420.0, ans=0.125 2024-08-15 17:51:08,658 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 17:51:26,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3298620.0, ans=0.125 2024-08-15 17:51:30,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298620.0, ans=0.1 2024-08-15 17:51:35,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3298620.0, ans=0.07 2024-08-15 17:51:41,963 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11050, loss[loss=0.07426, beats_loss=0.01196, ecapa_loss=0.0001722, whisper_loss=0.06057, over 15394.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001491, whisper_loss=0.09131, over 3944572.09 frames. ], batch size: 62, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:51:45,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3298720.0, ans=0.0 2024-08-15 17:52:03,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.406e+01 2.620e+01 2.867e+01 4.013e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-15 17:52:04,877 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 17:52:07,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298820.0, ans=0.1 2024-08-15 17:52:46,082 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 36 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 17:52:46,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-15 17:52:47,229 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 30 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-15 17:52:54,529 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11100, loss[loss=0.109, beats_loss=0.009056, ecapa_loss=0.0001566, whisper_loss=0.09835, over 21916.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001495, whisper_loss=0.09101, over 3945821.85 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:54:08,311 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11150, loss[loss=0.08706, beats_loss=0.01047, ecapa_loss=0.0001289, whisper_loss=0.07531, over 15106.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001482, whisper_loss=0.0913, over 3928771.04 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:54:28,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.377e+01 2.635e+01 3.031e+01 4.135e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-15 17:54:31,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3299820.0, ans=0.125 2024-08-15 17:54:40,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3299920.0, ans=0.2 2024-08-15 17:54:53,202 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 36 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-15 17:55:20,138 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11200, loss[loss=0.09892, beats_loss=0.01056, ecapa_loss=0.0001299, whisper_loss=0.08706, over 14594.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001491, whisper_loss=0.09158, over 3896571.87 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:55:32,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2024-08-15 17:55:39,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3300320.0, ans=0.125 2024-08-15 17:55:42,719 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 17:55:58,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3300420.0, ans=0.1 2024-08-15 17:56:07,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3300520.0, ans=0.1 2024-08-15 17:56:09,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3300520.0, ans=0.0 2024-08-15 17:56:10,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-15 17:56:19,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-15 17:56:33,530 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11250, loss[loss=0.08073, beats_loss=0.0138, ecapa_loss=0.0001233, whisper_loss=0.06569, over 19990.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001488, whisper_loss=0.09094, over 3913889.30 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:56:41,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2024-08-15 17:56:53,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.493e+01 2.758e+01 4.504e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 17:56:57,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-08-15 17:57:02,153 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-15 17:57:11,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3300920.0, ans=0.125 2024-08-15 17:57:13,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3300920.0, ans=0.0 2024-08-15 17:57:26,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2024-08-15 17:57:36,720 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 35 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 17:57:40,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3301120.0, ans=0.125 2024-08-15 17:57:44,729 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11300, loss[loss=0.1064, beats_loss=0.00873, ecapa_loss=0.0001604, whisper_loss=0.09602, over 23323.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001489, whisper_loss=0.0907, over 3910268.14 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:57:49,317 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 40 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 17:58:07,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-15 17:58:11,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3301320.0, ans=0.125 2024-08-15 17:58:15,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3301420.0, ans=0.125 2024-08-15 17:58:17,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3301420.0, ans=0.1 2024-08-15 17:58:33,370 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:58:39,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3301520.0, ans=0.05 2024-08-15 17:58:49,037 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 17:58:57,321 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11350, loss[loss=0.09494, beats_loss=0.01254, ecapa_loss=0.0001268, whisper_loss=0.08114, over 19904.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001477, whisper_loss=0.09106, over 3910681.41 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:59:05,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3301720.0, ans=0.125 2024-08-15 17:59:17,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.304e+01 2.607e+01 2.921e+01 2.640e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 17:59:37,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3301920.0, ans=0.125 2024-08-15 17:59:49,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3302020.0, ans=0.0 2024-08-15 17:59:55,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3302120.0, ans=0.125 2024-08-15 18:00:03,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.74 vs. limit=10.0 2024-08-15 18:00:10,300 INFO [train_multi_KD3.py:844] (2/4) A total of 96 cuts. 22 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-15 18:00:11,391 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11400, loss[loss=0.08234, beats_loss=0.01319, ecapa_loss=0.0001634, whisper_loss=0.06751, over 22493.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001488, whisper_loss=0.09037, over 3867164.14 frames. ], batch size: 96, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:00:15,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3302220.0, ans=0.125 2024-08-15 18:00:19,035 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 14 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 18:00:22,296 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 18:00:59,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3302520.0, ans=0.2 2024-08-15 18:01:15,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3302620.0, ans=0.0 2024-08-15 18:01:20,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-15 18:01:26,352 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11450, loss[loss=0.1083, beats_loss=0.009079, ecapa_loss=0.0001358, whisper_loss=0.09782, over 15685.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001487, whisper_loss=0.09016, over 3889692.21 frames. ], batch size: 59, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:01:31,210 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 18:01:46,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.318e+01 2.537e+01 2.814e+01 4.367e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-15 18:01:49,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-15 18:01:53,749 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 18:02:02,420 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 29 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 18:02:05,355 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 18:02:09,669 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 18:02:11,060 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 18:02:38,993 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11500, loss[loss=0.1165, beats_loss=0.009835, ecapa_loss=0.0001442, whisper_loss=0.1052, over 23070.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001482, whisper_loss=0.08987, over 3902085.01 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:02:39,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3303220.0, ans=0.125 2024-08-15 18:02:46,852 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 18:02:53,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3303320.0, ans=0.015 2024-08-15 18:03:01,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3303320.0, ans=0.125 2024-08-15 18:03:01,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3303320.0, ans=0.0 2024-08-15 18:03:10,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3303420.0, ans=0.125 2024-08-15 18:03:22,491 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 18:03:32,427 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 18:03:39,982 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.190e+05 2024-08-15 18:03:41,582 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-15 18:03:52,769 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11550, loss[loss=0.1109, beats_loss=0.01001, ecapa_loss=0.0001562, whisper_loss=0.09932, over 22050.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01069, ecapa_loss=0.0001476, whisper_loss=0.08989, over 3876468.16 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:04:06,060 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.717e+00 2024-08-15 18:04:10,335 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 18:04:12,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.433e+01 2.629e+01 2.861e+01 8.078e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-15 18:04:17,369 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 18:04:22,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3303920.0, ans=0.125 2024-08-15 18:04:26,054 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-15 18:04:33,082 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-08-15 18:04:46,470 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 18:04:47,876 INFO [train_multi_KD3.py:844] (2/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 18:04:48,540 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-15 18:04:57,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3304120.0, ans=0.05 2024-08-15 18:05:06,015 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 18:05:08,394 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11600, loss[loss=0.08009, beats_loss=0.01066, ecapa_loss=0.000179, whisper_loss=0.06764, over 20217.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001486, whisper_loss=0.08997, over 3896597.60 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:05:16,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=12.0 2024-08-15 18:05:21,107 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 18:05:22,611 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 21 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-15 18:05:38,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3304420.0, ans=0.0 2024-08-15 18:05:44,312 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 18:05:55,303 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0450630709528923, model_norm_threshold=52.58251190185547 2024-08-15 18:05:55,488 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.906e+05, grad_sumsq=1.906e+05, orig_rms_sq=1.000e+00 2024-08-15 18:05:58,689 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 18:06:15,051 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 18:06:20,084 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11650, loss[loss=0.1339, beats_loss=0.008807, ecapa_loss=0.0001783, whisper_loss=0.1233, over 22630.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001485, whisper_loss=0.08996, over 3921724.70 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:06:21,651 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 18:06:28,848 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 18:06:38,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3304820.0, ans=0.07 2024-08-15 18:06:40,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.450e+01 2.772e+01 2.999e+01 1.167e+03, threshold=5.544e+01, percent-clipped=1.0 2024-08-15 18:06:40,942 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 18:07:09,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.08 vs. limit=22.5 2024-08-15 18:07:14,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-08-15 18:07:31,356 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11700, loss[loss=0.1002, beats_loss=0.009328, ecapa_loss=0.0001734, whisper_loss=0.08912, over 19781.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001493, whisper_loss=0.09063, over 3942055.65 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:07:34,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3305220.0, ans=0.2 2024-08-15 18:07:56,784 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 18:08:33,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3305620.0, ans=0.05 2024-08-15 18:08:36,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3305620.0, ans=0.2 2024-08-15 18:08:38,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3305620.0, ans=0.125 2024-08-15 18:08:40,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3305620.0, ans=0.0 2024-08-15 18:08:40,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3305620.0, ans=0.0 2024-08-15 18:08:43,484 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11750, loss[loss=0.0903, beats_loss=0.01473, ecapa_loss=0.0001051, whisper_loss=0.07452, over 23584.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001492, whisper_loss=0.09106, over 3941038.29 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:08:43,678 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 18:08:45,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3305720.0, ans=0.125 2024-08-15 18:08:46,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3305720.0, ans=0.125 2024-08-15 18:08:46,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3305720.0, ans=0.125 2024-08-15 18:08:49,224 INFO [train_multi_KD3.py:844] (2/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 18:08:56,535 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 18:08:58,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3305820.0, ans=0.0 2024-08-15 18:09:03,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.526e+01 2.838e+01 3.948e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 18:09:07,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2024-08-15 18:09:14,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3305920.0, ans=0.0 2024-08-15 18:09:18,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3305920.0, ans=0.2 2024-08-15 18:09:24,214 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-15 18:09:27,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3306020.0, ans=0.125 2024-08-15 18:09:46,200 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:09:49,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3306120.0, ans=0.125 2024-08-15 18:09:51,556 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 18:09:55,760 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11800, loss[loss=0.1164, beats_loss=0.01288, ecapa_loss=0.0001333, whisper_loss=0.1022, over 21906.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.000149, whisper_loss=0.09103, over 3954333.80 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:09:59,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2024-08-15 18:10:00,058 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 25 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 18:10:03,740 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-15 18:10:15,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3306320.0, ans=0.125 2024-08-15 18:10:38,001 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 18:10:38,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3306520.0, ans=0.05 2024-08-15 18:11:06,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3306620.0, ans=0.025 2024-08-15 18:11:07,087 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 18:11:08,376 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11850, loss[loss=0.1137, beats_loss=0.009611, ecapa_loss=0.0001741, whisper_loss=0.1023, over 21630.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.0001489, whisper_loss=0.09125, over 3969422.66 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:11:25,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-15 18:11:28,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.292e+01 2.620e+01 2.942e+01 3.993e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 18:11:44,321 INFO [train_multi_KD3.py:844] (2/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 18:11:49,782 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 18:11:50,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-15 18:11:54,304 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 18:11:55,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-15 18:12:01,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3307020.0, ans=0.0 2024-08-15 18:12:04,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307120.0, ans=0.1 2024-08-15 18:12:20,196 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11900, loss[loss=0.1097, beats_loss=0.01066, ecapa_loss=0.0001461, whisper_loss=0.09754, over 23401.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001497, whisper_loss=0.09159, over 3984250.17 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:12:25,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3307220.0, ans=0.125 2024-08-15 18:12:40,510 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 23 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-15 18:12:45,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2024-08-15 18:12:55,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-08-15 18:13:03,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3307520.0, ans=0.125 2024-08-15 18:13:10,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3307520.0, ans=0.125 2024-08-15 18:13:17,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3307620.0, ans=0.0 2024-08-15 18:13:27,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-15 18:13:33,029 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 11950, loss[loss=0.09629, beats_loss=0.009321, ecapa_loss=0.0001864, whisper_loss=0.08511, over 16527.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001495, whisper_loss=0.0913, over 3964005.54 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:13:42,809 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 18:13:43,245 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.053e-01 2024-08-15 18:13:51,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-08-15 18:13:52,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.261e+01 2.658e+01 2.941e+01 1.221e+02, threshold=5.315e+01, percent-clipped=2.0 2024-08-15 18:14:01,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-15 18:14:10,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3307920.0, ans=0.125 2024-08-15 18:14:20,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3308020.0, ans=0.1 2024-08-15 18:14:37,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3308120.0, ans=0.0 2024-08-15 18:14:42,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3308120.0, ans=0.0 2024-08-15 18:14:42,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3308120.0, ans=0.125 2024-08-15 18:14:44,626 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12000, loss[loss=0.09465, beats_loss=0.01074, ecapa_loss=0.0001126, whisper_loss=0.08278, over 20217.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001486, whisper_loss=0.09082, over 3911007.60 frames. ], batch size: 77, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:14:44,627 INFO [train_multi_KD3.py:1139] (2/4) Computing validation loss 2024-08-15 18:15:24,341 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0.2463, over 922467.00 frames. 2024-08-15 18:15:36,135 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2256, 4.0146, 3.0701, 3.5781], device='cuda:2') 2024-08-15 18:15:43,477 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on SV_voxceleb1: loss=0.004172, beats_loss=0, ecapa_loss=0.0004172, whisper_loss=0, over 939242.00 frames. 2024-08-15 18:17:17,599 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.5866, 2.7860, 2.5842, 2.2688], device='cuda:2') 2024-08-15 18:17:26,437 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9339, 3.2873, 3.7178, 3.7565], device='cuda:2') 2024-08-15 18:17:41,615 INFO [train_multi_KD3.py:1149] (2/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 18:17:41,619 INFO [train_multi_KD3.py:1155] (2/4) Maximum memory allocated so far is 31611MB 2024-08-15 18:17:42,487 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.53 vs. limit=10.0 2024-08-15 18:17:43,457 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 26 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 18:18:00,370 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3308320.0, ans=0.0 2024-08-15 18:18:04,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3308320.0, ans=0.125 2024-08-15 18:18:10,820 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-15 18:18:26,691 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 18:18:29,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3308520.0, ans=0.1 2024-08-15 18:18:34,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=15.0 2024-08-15 18:18:35,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-15 18:18:37,172 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:18:39,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3308620.0, ans=0.125 2024-08-15 18:18:47,115 INFO [train_multi_KD3.py:844] (2/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 18:18:48,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3308620.0, ans=10.0 2024-08-15 18:18:55,717 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12050, loss[loss=0.1225, beats_loss=0.008388, ecapa_loss=0.0001573, whisper_loss=0.1125, over 18410.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001485, whisper_loss=0.091, over 3873673.60 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:19:07,789 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 18:19:16,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.282e+01 2.556e+01 2.851e+01 3.972e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 18:19:24,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3308920.0, ans=0.0 2024-08-15 18:19:36,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-15 18:19:38,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3308920.0, ans=0.0 2024-08-15 18:19:40,778 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 18:19:45,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3309020.0, ans=0.125 2024-08-15 18:19:48,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3309020.0, ans=0.125 2024-08-15 18:19:54,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3309120.0, ans=0.125 2024-08-15 18:19:55,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3309120.0, ans=0.1 2024-08-15 18:20:03,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3309120.0, ans=0.125 2024-08-15 18:20:10,327 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12100, loss[loss=0.1161, beats_loss=0.01058, ecapa_loss=0.00016, whisper_loss=0.1039, over 22005.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001491, whisper_loss=0.09074, over 3892111.39 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:20:24,920 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 18:20:40,892 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 18:20:46,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2024-08-15 18:20:54,852 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 18:20:59,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3309520.0, ans=0.125 2024-08-15 18:21:00,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3309520.0, ans=0.02 2024-08-15 18:21:16,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3309620.0, ans=0.125 2024-08-15 18:21:23,551 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 18:21:23,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3309720.0, ans=0.0 2024-08-15 18:21:24,708 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12150, loss[loss=0.08853, beats_loss=0.01174, ecapa_loss=0.0001517, whisper_loss=0.07528, over 20864.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001485, whisper_loss=0.09053, over 3889899.76 frames. ], batch size: 85, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:21:45,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3309820.0, ans=0.0 2024-08-15 18:21:46,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.187e+01 2.499e+01 2.897e+01 4.006e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-15 18:21:54,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3309920.0, ans=0.125 2024-08-15 18:22:07,524 INFO [train_multi_KD3.py:844] (2/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 18:22:25,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3310120.0, ans=0.1 2024-08-15 18:22:31,176 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 18:22:40,001 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12200, loss[loss=0.09272, beats_loss=0.01001, ecapa_loss=0.0001278, whisper_loss=0.08144, over 20362.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01071, ecapa_loss=0.0001477, whisper_loss=0.08987, over 3870109.09 frames. ], batch size: 82, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:22:47,637 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 18:23:11,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3310420.0, ans=0.0 2024-08-15 18:23:13,184 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 18:23:33,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3310520.0, ans=0.125 2024-08-15 18:23:35,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3310520.0, ans=0.125 2024-08-15 18:23:53,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3310720.0, ans=0.0 2024-08-15 18:23:54,456 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12250, loss[loss=0.1006, beats_loss=0.00892, ecapa_loss=0.0001761, whisper_loss=0.08992, over 22324.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001479, whisper_loss=0.09029, over 3897720.01 frames. ], batch size: 95, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:24:05,273 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 18:24:15,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.354e+01 2.587e+01 2.883e+01 9.186e+01, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 18:24:15,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3310820.0, ans=0.125 2024-08-15 18:24:17,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3310820.0, ans=0.2 2024-08-15 18:24:21,999 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 18:24:36,149 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-15 18:24:50,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3311020.0, ans=0.0 2024-08-15 18:24:54,691 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 16 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-15 18:24:55,879 INFO [train_multi_KD3.py:844] (2/4) A total of 59 cuts. 21 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-15 18:24:59,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3311120.0, ans=0.0 2024-08-15 18:25:00,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3311120.0, ans=0.125 2024-08-15 18:25:00,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-15 18:25:08,689 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12300, loss[loss=0.109, beats_loss=0.01238, ecapa_loss=0.0001475, whisper_loss=0.09517, over 22491.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001499, whisper_loss=0.09043, over 3891356.30 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:25:29,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3311320.0, ans=0.125 2024-08-15 18:25:30,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3311320.0, ans=0.125 2024-08-15 18:25:38,399 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 18:25:42,986 INFO [train_multi_KD3.py:844] (2/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 18:25:48,832 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 18:26:12,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3311620.0, ans=0.125 2024-08-15 18:26:15,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=12.0 2024-08-15 18:26:18,921 INFO [train_multi_KD3.py:844] (2/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 18:26:20,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311620.0, ans=0.1 2024-08-15 18:26:22,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3311620.0, ans=0.0 2024-08-15 18:26:23,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311720.0, ans=0.1 2024-08-15 18:26:24,217 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12350, loss[loss=0.1001, beats_loss=0.01061, ecapa_loss=0.000187, whisper_loss=0.08758, over 21983.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001512, whisper_loss=0.09095, over 3892812.92 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:26:37,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3311720.0, ans=0.0 2024-08-15 18:26:44,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.423e+01 2.680e+01 3.098e+01 2.023e+02, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:27:04,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3311920.0, ans=0.125 2024-08-15 18:27:10,246 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 18:27:19,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312020.0, ans=0.1 2024-08-15 18:27:25,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3312120.0, ans=0.05 2024-08-15 18:27:29,785 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 18:27:38,587 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12400, loss[loss=0.07852, beats_loss=0.01064, ecapa_loss=0.0001509, whisper_loss=0.06637, over 18973.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001506, whisper_loss=0.09111, over 3885072.25 frames. ], batch size: 78, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:27:42,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3312220.0, ans=0.125 2024-08-15 18:27:43,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3312220.0, ans=0.125 2024-08-15 18:28:10,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3312420.0, ans=0.125 2024-08-15 18:28:31,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2024-08-15 18:28:33,914 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.716e-03 2024-08-15 18:28:34,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-15 18:28:52,976 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12450, loss[loss=0.1133, beats_loss=0.01048, ecapa_loss=0.0001462, whisper_loss=0.1014, over 23076.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001506, whisper_loss=0.09077, over 3865810.68 frames. ], batch size: 94, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:29:12,645 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 18:29:13,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.349e+01 2.594e+01 2.911e+01 3.951e+02, threshold=5.187e+01, percent-clipped=3.0 2024-08-15 18:29:26,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3312920.0, ans=0.125 2024-08-15 18:29:59,087 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 18:30:07,637 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12500, loss[loss=0.08687, beats_loss=0.01061, ecapa_loss=0.0001596, whisper_loss=0.07466, over 19929.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001498, whisper_loss=0.0908, over 3871011.70 frames. ], batch size: 78, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:30:18,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3313220.0, ans=0.2 2024-08-15 18:30:19,806 INFO [train_multi_KD3.py:844] (2/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 18:30:20,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.51 vs. limit=6.0 2024-08-15 18:30:21,318 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 18:30:26,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3313320.0, ans=0.125 2024-08-15 18:30:53,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3313520.0, ans=0.1 2024-08-15 18:31:12,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3313620.0, ans=0.125 2024-08-15 18:31:16,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.39 vs. limit=6.0 2024-08-15 18:31:19,898 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 18:31:22,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-15 18:31:23,160 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12550, loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001462, whisper_loss=0.08872, over 18462.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001495, whisper_loss=0.08982, over 3840296.33 frames. ], batch size: 72, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:31:29,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3313720.0, ans=0.125 2024-08-15 18:31:31,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3313720.0, ans=0.0 2024-08-15 18:31:31,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3313720.0, ans=6.0 2024-08-15 18:31:44,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.273e+01 2.491e+01 2.694e+01 3.703e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-15 18:31:51,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3313820.0, ans=0.015 2024-08-15 18:32:16,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3314020.0, ans=0.2 2024-08-15 18:32:25,850 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.465e+01 2024-08-15 18:32:39,132 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12600, loss[loss=0.1018, beats_loss=0.01094, ecapa_loss=0.0001492, whisper_loss=0.08936, over 14435.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001501, whisper_loss=0.08992, over 3835762.85 frames. ], batch size: 58, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:32:49,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3314220.0, ans=0.0 2024-08-15 18:32:53,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3314320.0, ans=0.125 2024-08-15 18:32:53,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3314320.0, ans=0.125 2024-08-15 18:32:58,711 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08482968807220459, model_norm_threshold=49.81049346923828 2024-08-15 18:32:58,891 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.887e+04, grad_sumsq=6.887e+04, orig_rms_sq=1.000e+00 2024-08-15 18:33:26,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-15 18:33:34,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3314520.0, ans=0.1 2024-08-15 18:33:43,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3314620.0, ans=0.0 2024-08-15 18:33:44,732 INFO [train_multi_KD3.py:844] (2/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 18:33:49,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=22.5 2024-08-15 18:33:53,192 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12650, loss[loss=0.1118, beats_loss=0.01107, ecapa_loss=0.0001296, whisper_loss=0.09947, over 22335.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001492, whisper_loss=0.09004, over 3865687.29 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:33:58,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-08-15 18:33:59,315 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 18:34:02,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3314720.0, ans=0.125 2024-08-15 18:34:05,047 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 18:34:06,548 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-15 18:34:08,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3314820.0, ans=0.0 2024-08-15 18:34:13,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.371e+01 2.618e+01 2.895e+01 5.872e+02, threshold=5.236e+01, percent-clipped=1.0 2024-08-15 18:34:16,855 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 18:34:17,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3314820.0, ans=0.125 2024-08-15 18:34:29,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3314920.0, ans=0.0 2024-08-15 18:34:38,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3315020.0, ans=0.125 2024-08-15 18:35:07,077 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12700, loss[loss=0.08157, beats_loss=0.01558, ecapa_loss=0.0001141, whisper_loss=0.06485, over 20006.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001499, whisper_loss=0.08973, over 3842954.35 frames. ], batch size: 82, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:35:07,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3315220.0, ans=0.0 2024-08-15 18:35:08,766 INFO [train_multi_KD3.py:844] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 18:35:17,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3315220.0, ans=0.125 2024-08-15 18:35:17,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3315220.0, ans=0.0 2024-08-15 18:35:24,261 INFO [train_multi_KD3.py:844] (2/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 18:35:32,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3315320.0, ans=0.125 2024-08-15 18:36:00,132 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 18:36:04,942 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 18:36:22,295 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12750, loss[loss=0.09645, beats_loss=0.01254, ecapa_loss=0.000153, whisper_loss=0.08238, over 18113.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001502, whisper_loss=0.08953, over 3853803.84 frames. ], batch size: 78, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:36:22,696 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 36 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 18:36:31,674 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 18:36:33,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3315720.0, ans=0.125 2024-08-15 18:36:43,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.562e+01 2.837e+01 4.631e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-15 18:36:46,847 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-15 18:36:52,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315920.0, ans=0.1 2024-08-15 18:36:54,774 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 18:36:55,997 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 18:37:02,274 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 18:37:29,472 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 18:37:36,489 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12800, loss[loss=0.09985, beats_loss=0.01233, ecapa_loss=0.0001531, whisper_loss=0.08599, over 20903.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001516, whisper_loss=0.09004, over 3872935.84 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:37:52,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3316320.0, ans=0.125 2024-08-15 18:37:55,558 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:38:01,081 INFO [train_multi_KD3.py:844] (2/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 18:38:36,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3316620.0, ans=0.2 2024-08-15 18:38:45,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3316620.0, ans=0.125 2024-08-15 18:38:52,371 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12850, loss[loss=0.1033, beats_loss=0.01114, ecapa_loss=0.0001444, whisper_loss=0.09073, over 17734.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001509, whisper_loss=0.08961, over 3856787.84 frames. ], batch size: 73, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:38:55,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.25 vs. limit=22.5 2024-08-15 18:38:56,726 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:39:01,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3316720.0, ans=0.2 2024-08-15 18:39:13,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.324e+01 2.629e+01 2.874e+01 4.372e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-15 18:39:29,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316920.0, ans=0.1 2024-08-15 18:39:48,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2024-08-15 18:39:51,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3317120.0, ans=0.0 2024-08-15 18:39:56,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3317120.0, ans=0.0 2024-08-15 18:39:57,865 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-15 18:39:59,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3317120.0, ans=0.0 2024-08-15 18:40:00,381 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:40:07,009 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12900, loss[loss=0.1053, beats_loss=0.01156, ecapa_loss=0.0001292, whisper_loss=0.09245, over 23302.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001508, whisper_loss=0.08948, over 3852442.75 frames. ], batch size: 95, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:40:13,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3317220.0, ans=0.5 2024-08-15 18:40:19,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3317220.0, ans=0.125 2024-08-15 18:40:28,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3317320.0, ans=0.0 2024-08-15 18:40:57,910 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 18:41:06,759 INFO [train_multi_KD3.py:844] (2/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 18:41:21,528 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 12950, loss[loss=0.09912, beats_loss=0.009985, ecapa_loss=0.0001534, whisper_loss=0.0876, over 15555.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001502, whisper_loss=0.08922, over 3869393.03 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:41:40,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.294e+01 2.551e+01 2.879e+01 4.880e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-15 18:41:48,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=12.0 2024-08-15 18:42:05,177 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 18:42:34,469 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13000, loss[loss=0.1079, beats_loss=0.00905, ecapa_loss=0.0001241, whisper_loss=0.09758, over 14661.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001491, whisper_loss=0.08964, over 3876390.16 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:42:45,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3318220.0, ans=0.0 2024-08-15 18:43:01,178 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 36 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 18:43:02,526 INFO [train_multi_KD3.py:844] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 18:43:12,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3318420.0, ans=0.125 2024-08-15 18:43:23,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.49 vs. limit=10.0 2024-08-15 18:43:36,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3318620.0, ans=0.0 2024-08-15 18:43:39,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3318620.0, ans=0.125 2024-08-15 18:43:48,950 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13050, loss[loss=0.1143, beats_loss=0.009074, ecapa_loss=0.0001657, whisper_loss=0.1036, over 23066.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001492, whisper_loss=0.09, over 3876408.33 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:43:50,714 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 22 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-15 18:43:56,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:44:07,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3318820.0, ans=0.0 2024-08-15 18:44:09,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.364e+01 2.592e+01 2.928e+01 7.191e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-15 18:44:54,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3319120.0, ans=0.07 2024-08-15 18:44:59,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3319120.0, ans=0.1 2024-08-15 18:45:01,816 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13100, loss[loss=0.0993, beats_loss=0.01017, ecapa_loss=0.0001288, whisper_loss=0.08784, over 19523.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001492, whisper_loss=0.08969, over 3838506.02 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:45:05,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3319220.0, ans=0.0 2024-08-15 18:45:18,918 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-08-15 18:45:23,775 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 18:45:28,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3319320.0, ans=0.125 2024-08-15 18:45:36,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319420.0, ans=0.1 2024-08-15 18:45:48,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3319520.0, ans=0.125 2024-08-15 18:45:51,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3319520.0, ans=0.125 2024-08-15 18:45:56,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319520.0, ans=0.1 2024-08-15 18:46:07,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3319620.0, ans=0.0 2024-08-15 18:46:14,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3319620.0, ans=0.125 2024-08-15 18:46:18,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3319720.0, ans=0.1 2024-08-15 18:46:19,574 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13150, loss[loss=0.1067, beats_loss=0.0108, ecapa_loss=0.0001191, whisper_loss=0.09473, over 16470.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.08981, over 3832239.76 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:46:20,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3319720.0, ans=0.0 2024-08-15 18:46:32,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3319720.0, ans=0.125 2024-08-15 18:46:41,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.374e+01 2.573e+01 2.884e+01 4.147e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-15 18:46:42,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3319820.0, ans=0.2 2024-08-15 18:46:50,414 INFO [train_multi_KD3.py:844] (2/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 18:46:56,519 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 21 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-15 18:46:59,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3319920.0, ans=0.025 2024-08-15 18:47:00,798 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 18:47:36,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3320120.0, ans=0.2 2024-08-15 18:47:37,803 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.634e-02 2024-08-15 18:47:38,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3320120.0, ans=0.1 2024-08-15 18:47:43,751 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13200, loss[loss=0.06781, beats_loss=0.01069, ecapa_loss=0.0001503, whisper_loss=0.05561, over 18773.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001485, whisper_loss=0.08958, over 3808522.67 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:47:47,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3320220.0, ans=0.07 2024-08-15 18:48:08,942 INFO [train_multi_KD3.py:844] (2/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 18:48:23,509 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 18:48:25,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3320420.0, ans=0.07 2024-08-15 18:48:37,063 INFO [train_multi_KD3.py:844] (2/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 18:48:48,163 INFO [train_multi_KD3.py:844] (2/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 18:49:06,926 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13250, loss[loss=0.09259, beats_loss=0.0112, ecapa_loss=0.0001347, whisper_loss=0.08004, over 19514.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.000149, whisper_loss=0.0898, over 3807693.59 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:49:19,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3320720.0, ans=0.125 2024-08-15 18:49:30,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.301e+01 2.680e+01 3.189e+01 5.288e+01, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:49:32,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3320820.0, ans=0.2 2024-08-15 18:49:40,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-08-15 18:49:41,107 INFO [train_multi_KD3.py:844] (2/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 18:49:53,553 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.56 vs. limit=6.0 2024-08-15 18:50:00,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3321020.0, ans=0.0 2024-08-15 18:50:09,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3321020.0, ans=0.02 2024-08-15 18:50:10,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-15 18:50:29,659 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13300, loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001642, whisper_loss=0.09091, over 20550.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001491, whisper_loss=0.09005, over 3845463.69 frames. ], batch size: 83, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:51:24,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3321520.0, ans=0.09899494936611666 2024-08-15 18:51:27,741 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 18:51:36,602 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-15 18:51:55,938 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13350, loss[loss=0.1068, beats_loss=0.009881, ecapa_loss=0.0001542, whisper_loss=0.09538, over 21926.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.0001477, whisper_loss=0.08976, over 3863440.59 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:52:03,979 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:52:09,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3321720.0, ans=0.1 2024-08-15 18:52:21,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.271e+01 2.660e+01 2.983e+01 5.401e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-15 18:53:01,182 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 18:53:05,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3322120.0, ans=0.125 2024-08-15 18:53:23,031 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13400, loss[loss=0.09551, beats_loss=0.01291, ecapa_loss=0.0001395, whisper_loss=0.08121, over 21929.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001491, whisper_loss=0.09052, over 3873873.49 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:53:23,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-08-15 18:53:53,961 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.492e-02 2024-08-15 18:54:09,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3322420.0, ans=0.0 2024-08-15 18:54:19,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3322520.0, ans=0.0 2024-08-15 18:54:28,406 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-15 18:54:45,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322620.0, ans=0.1 2024-08-15 18:54:50,145 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13450, loss[loss=0.09156, beats_loss=0.0112, ecapa_loss=0.0001324, whisper_loss=0.07903, over 22698.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001494, whisper_loss=0.09073, over 3905737.82 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:54:53,715 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 18:55:04,451 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 18:55:14,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.424e+01 2.655e+01 2.945e+01 1.400e+03, threshold=5.311e+01, percent-clipped=0.0 2024-08-15 18:55:14,903 WARNING [optim.py:496] (2/4) Scaling gradients by 0.037934403866529465, model_norm_threshold=53.10542297363281 2024-08-15 18:55:15,084 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.249e+05, grad_sumsq=5.178e+07, orig_rms_sq=1.014e-02 2024-08-15 18:55:16,762 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 18:55:18,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3322820.0, ans=0.125 2024-08-15 18:55:35,847 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 18 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-15 18:55:52,989 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-15 18:55:57,367 INFO [train_multi_KD3.py:844] (2/4) A total of 55 cuts. 12 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 18:55:59,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-15 18:56:04,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3323120.0, ans=0.125 2024-08-15 18:56:13,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3323120.0, ans=0.125 2024-08-15 18:56:16,021 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13500, loss[loss=0.0938, beats_loss=0.01262, ecapa_loss=0.0001277, whisper_loss=0.07991, over 20907.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001493, whisper_loss=0.09023, over 3912485.63 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:56:16,316 INFO [train_multi_KD3.py:844] (2/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 18:56:40,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3323320.0, ans=0.0 2024-08-15 18:56:49,848 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.29 vs. limit=10.0 2024-08-15 18:56:52,948 INFO [train_multi_KD3.py:844] (2/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 18:56:53,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3323420.0, ans=0.0 2024-08-15 18:57:00,903 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-15 18:57:01,908 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 18:57:18,728 INFO [train_multi_KD3.py:844] (2/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 18:57:39,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-08-15 18:57:44,430 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13550, loss[loss=0.1309, beats_loss=0.007529, ecapa_loss=0.0001773, whisper_loss=0.1216, over 21920.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001491, whisper_loss=0.09042, over 3933316.34 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:57:48,728 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 18:57:57,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3323720.0, ans=0.0 2024-08-15 18:58:01,441 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-15 18:58:08,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.273e+01 2.505e+01 2.907e+01 8.129e+01, threshold=5.010e+01, percent-clipped=4.0 2024-08-15 18:58:16,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-15 18:58:18,794 INFO [train_multi_KD3.py:844] (2/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 18:58:24,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3323920.0, ans=0.125 2024-08-15 18:58:29,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323920.0, ans=0.1 2024-08-15 18:58:39,457 INFO [train_multi_KD3.py:844] (2/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 18:58:46,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-15 18:58:47,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3324020.0, ans=0.125 2024-08-15 18:59:10,028 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13600, loss[loss=0.09104, beats_loss=0.01225, ecapa_loss=0.0001353, whisper_loss=0.07744, over 14180.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0001483, whisper_loss=0.08989, over 3940939.78 frames. ], batch size: 57, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:59:33,397 INFO [train_multi_KD3.py:844] (2/4) A total of 63 cuts. 26 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-15 18:59:59,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3324520.0, ans=0.0 2024-08-15 19:00:34,815 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13650, loss[loss=0.07838, beats_loss=0.01126, ecapa_loss=0.0001624, whisper_loss=0.0655, over 13628.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001492, whisper_loss=0.09026, over 3903086.72 frames. ], batch size: 58, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:00:34,982 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 19:00:52,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3324820.0, ans=0.0 2024-08-15 19:00:58,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.294e+01 2.538e+01 2.832e+01 8.240e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-15 19:01:04,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3324820.0, ans=0.0 2024-08-15 19:01:14,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3324920.0, ans=0.125 2024-08-15 19:01:16,287 INFO [train_multi_KD3.py:844] (2/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 19:01:24,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-15 19:01:59,447 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13700, loss[loss=0.114, beats_loss=0.009405, ecapa_loss=0.0001944, whisper_loss=0.1027, over 21299.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001492, whisper_loss=0.09107, over 3905142.81 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:02:22,947 INFO [train_multi_KD3.py:844] (2/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 19:02:27,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3325320.0, ans=0.125 2024-08-15 19:02:32,869 INFO [train_multi_KD3.py:844] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 19:07:57,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-15 19:24:19,740 INFO [train_multi_KD3.py:844] (2/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 19:52:54,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3325620.0, ans=0.125 2024-08-15 19:59:43,258 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13750, loss[loss=0.07933, beats_loss=0.01133, ecapa_loss=0.0001637, whisper_loss=0.06637, over 18058.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001492, whisper_loss=0.09059, over 3908186.63 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 20:06:33,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-15 20:15:43,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3325720.0, ans=0.2 2024-08-15 20:47:28,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.373e+01 2.597e+01 2.828e+01 1.512e+02, threshold=5.195e+01, percent-clipped=2.0 2024-08-15 21:19:58,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=22.5 2024-08-15 21:52:26,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3326020.0, ans=0.0 2024-08-15 22:41:52,444 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13800, loss[loss=0.1074, beats_loss=0.01044, ecapa_loss=0.0001524, whisper_loss=0.09544, over 23434.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001486, whisper_loss=0.09124, over 3875331.68 frames. ], batch size: 94, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 22:51:07,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3326220.0, ans=0.125 2024-08-15 22:56:53,872 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 23:31:39,667 INFO [train_multi_KD3.py:844] (2/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 23:37:17,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3326320.0, ans=0.0 2024-08-16 00:13:20,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2024-08-16 00:19:55,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3326520.0, ans=0.125 2024-08-16 00:33:48,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326520.0, ans=0.125 2024-08-16 00:33:48,637 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.989e+00 2024-08-16 00:55:05,964 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-16 01:06:42,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3326620.0, ans=0.125 2024-08-16 01:12:17,684 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13850, loss[loss=0.1236, beats_loss=0.007455, ecapa_loss=0.0001812, whisper_loss=0.1144, over 17620.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001496, whisper_loss=0.09097, over 3886008.36 frames. ], batch size: 69, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 01:28:49,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-16 01:45:34,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3326820.0, ans=0.125 2024-08-16 01:55:15,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.228e+01 2.445e+01 2.757e+01 2.786e+02, threshold=4.891e+01, percent-clipped=1.0 2024-08-16 02:20:56,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3326920.0, ans=0.125 2024-08-16 03:25:23,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-16 03:30:31,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3327120.0, ans=0.125 2024-08-16 03:47:18,323 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13900, loss[loss=0.07296, beats_loss=0.01337, ecapa_loss=0.0001399, whisper_loss=0.05819, over 19932.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01052, ecapa_loss=0.0001481, whisper_loss=0.09197, over 3902391.38 frames. ], batch size: 83, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 04:23:11,132 INFO [train_multi_KD3.py:844] (2/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-16 04:49:51,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3327420.0, ans=0.0 2024-08-16 05:06:24,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-16 05:29:33,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3327520.0, ans=0.125 2024-08-16 05:32:01,567 INFO [train_multi_KD3.py:844] (2/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-16 05:34:08,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3327520.0, ans=0.125 2024-08-16 05:42:17,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3327520.0, ans=0.125 2024-08-16 05:59:50,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3327620.0, ans=0.1 2024-08-16 06:16:02,361 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 13950, loss[loss=0.1049, beats_loss=0.009132, ecapa_loss=0.0001796, whisper_loss=0.09394, over 15521.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.000148, whisper_loss=0.09117, over 3846926.92 frames. ], batch size: 62, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 06:57:05,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.658e+01 3.040e+01 4.567e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-16 07:04:06,310 INFO [train_multi_KD3.py:844] (2/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-16 07:06:11,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3327820.0, ans=0.015 2024-08-16 07:58:34,002 INFO [train_multi_KD3.py:844] (2/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-16 08:04:26,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3328020.0, ans=10.0 2024-08-16 08:38:40,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3328120.0, ans=0.0 2024-08-16 09:02:46,051 INFO [train_multi_KD3.py:1116] (2/4) Epoch 23, batch 14000, loss[loss=0.1024, beats_loss=0.01194, ecapa_loss=0.0001425, whisper_loss=0.08905, over 16235.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001479, whisper_loss=0.09137, over 3877635.28 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 09:31:04,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3328320.0, ans=22.5 2024-08-16 09:34:56,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328320.0, ans=0.1 2024-08-16 09:47:58,567 INFO [train_multi_KD3.py:844] (2/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-16 09:49:34,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3328420.0, ans=0.2 2024-08-16 09:52:13,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3328420.0, ans=0.0 2024-08-16 10:02:27,827 INFO [train_multi_KD3.py:844] (2/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-16 10:15:07,519 INFO [train_multi_KD3.py:844] (2/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-16 10:18:12,312 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 28 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-16 10:18:12,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328420.0, ans=0.1 2024-08-16 10:27:31,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3328520.0, ans=0.1 2024-08-16 10:28:58,259 INFO [train_multi_KD3.py:844] (2/4) A total of 71 cuts. 15 from LS+wenet, 20 from Vox, 36 fro AS